The patch titled
Subject: Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint"
has been removed from the -mm tree. Its filename was
revert-kernel-rebootc-convert-simple_strtoul-to-kstrtoint.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Matteo Croce <mcroce(a)microsoft.com>
Subject: Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint"
Patch series "fix parsing of reboot= cmdline", v3.
The parsing of the reboot= cmdline has two major errors:
- a missing bound check can crash the system on reboot
- parsing of the cpu number only works if specified last
Fix both.
This patch (of 2):
This reverts commit 616feab753972b97.
kstrtoint() and simple_strtoul() have a subtle difference which makes them
non interchangeable: if a non digit character is found amid the parsing,
the former will return an error, while the latter will just stop parsing,
e.g. simple_strtoul("123xyx") = 123.
The kernel cmdline reboot= argument allows to specify the CPU used for
rebooting, with the syntax `s####` among the other flags, e.g.
"reboot=warm,s31,force", so if this flag is not the last given, it's
silently ignored as well as the subsequent ones.
Link: https://lkml.kernel.org/r/20201103214025.116799-2-mcroce@linux.microsoft.com
Fixes: 616feab75397 ("kernel/reboot.c: convert simple_strtoul to kstrtoint")
Signed-off-by: Matteo Croce <mcroce(a)microsoft.com>
Cc: Guenter Roeck <linux(a)roeck-us.net>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Cc: Robin Holt <robinmholt(a)gmail.com>
Cc: Fabian Frederick <fabf(a)skynet.be>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/reboot.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)
--- a/kernel/reboot.c~revert-kernel-rebootc-convert-simple_strtoul-to-kstrtoint
+++ a/kernel/reboot.c
@@ -551,22 +551,15 @@ static int __init reboot_setup(char *str
break;
case 's':
- {
- int rc;
-
- if (isdigit(*(str+1))) {
- rc = kstrtoint(str+1, 0, &reboot_cpu);
- if (rc)
- return rc;
- } else if (str[1] == 'm' && str[2] == 'p' &&
- isdigit(*(str+3))) {
- rc = kstrtoint(str+3, 0, &reboot_cpu);
- if (rc)
- return rc;
- } else
+ if (isdigit(*(str+1)))
+ reboot_cpu = simple_strtoul(str+1, NULL, 0);
+ else if (str[1] == 'm' && str[2] == 'p' &&
+ isdigit(*(str+3)))
+ reboot_cpu = simple_strtoul(str+3, NULL, 0);
+ else
*mode = REBOOT_SOFT;
break;
- }
+
case 'g':
*mode = REBOOT_GPIO;
break;
_
Patches currently in -mm which might be from mcroce(a)microsoft.com are
reboot-refactor-and-comment-the-cpu-selection-code.patch
reboot-allow-to-specify-reboot-mode-via-sysfs.patch
reboot-remove-cf9_safe-from-allowed-types-and-rename-cf9_force.patch
The patch titled
Subject: compiler.h: fix barrier_data() on clang
has been removed from the -mm tree. Its filename was
compilerh-fix-barrier_data-on-clang.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Arvind Sankar <nivedita(a)alum.mit.edu>
Subject: compiler.h: fix barrier_data() on clang
Commit 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h
mutually exclusive") neglected to copy barrier_data() from compiler-gcc.h
into compiler-clang.h. The definition in compiler-gcc.h was really to
work around clang's more aggressive optimization, so this broke
barrier_data() on clang, and consequently memzero_explicit() as well.
For example, this results in at least the memzero_explicit() call in
lib/crypto/sha256.c:sha256_transform() being optimized away by clang.
Fix this by moving the definition of barrier_data() into compiler.h.
Also move the gcc/clang definition of barrier() into compiler.h,
__memory_barrier() is icc-specific (and barrier() is already defined using
it in compiler-intel.h) and doesn't belong in compiler.h.
[rdunlap(a)infradead.org: fix ALPHA builds when SMP is not enabled]
Link: https://lkml.kernel.org/r/20201101231835.4589-1-rdunlap@infradead.org
Link: https://lkml.kernel.org/r/20201014212631.207844-1-nivedita@alum.mit.edu
Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
Signed-off-by: Arvind Sankar <nivedita(a)alum.mit.edu>
Signed-off-by: Randy Dunlap <rdunlap(a)infradead.org>
Reviewed-by: Nick Desaulniers <ndesaulniers(a)google.com>
Tested-by: Nick Desaulniers <ndesaulniers(a)google.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/asm-generic/barrier.h | 1 +
include/linux/compiler-clang.h | 6 ------
include/linux/compiler-gcc.h | 19 -------------------
include/linux/compiler.h | 18 ++++++++++++++++--
4 files changed, 17 insertions(+), 27 deletions(-)
--- a/include/asm-generic/barrier.h~compilerh-fix-barrier_data-on-clang
+++ a/include/asm-generic/barrier.h
@@ -13,6 +13,7 @@
#ifndef __ASSEMBLY__
+#include <linux/compiler.h>
#include <asm/rwonce.h>
#ifndef nop
--- a/include/linux/compiler-clang.h~compilerh-fix-barrier_data-on-clang
+++ a/include/linux/compiler-clang.h
@@ -60,12 +60,6 @@
#define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
#endif
-/* The following are for compatibility with GCC, from compiler-gcc.h,
- * and may be redefined here because they should not be shared with other
- * compilers, like ICC.
- */
-#define barrier() __asm__ __volatile__("" : : : "memory")
-
#if __has_feature(shadow_call_stack)
# define __noscs __attribute__((__no_sanitize__("shadow-call-stack")))
#endif
--- a/include/linux/compiler-gcc.h~compilerh-fix-barrier_data-on-clang
+++ a/include/linux/compiler-gcc.h
@@ -15,25 +15,6 @@
# error Sorry, your version of GCC is too old - please use 4.9 or newer.
#endif
-/* Optimization barrier */
-
-/* The "volatile" is due to gcc bugs */
-#define barrier() __asm__ __volatile__("": : :"memory")
-/*
- * This version is i.e. to prevent dead stores elimination on @ptr
- * where gcc and llvm may behave differently when otherwise using
- * normal barrier(): while gcc behavior gets along with a normal
- * barrier(), llvm needs an explicit input variable to be assumed
- * clobbered. The issue is as follows: while the inline asm might
- * access any memory it wants, the compiler could have fit all of
- * @ptr into memory registers instead, and since @ptr never escaped
- * from that, it proved that the inline asm wasn't touching any of
- * it. This version works well with both compilers, i.e. we're telling
- * the compiler that the inline asm absolutely may see the contents
- * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
- */
-#define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
-
/*
* This macro obfuscates arithmetic on a variable address so that gcc
* shouldn't recognize the original var, and make assumptions about it.
--- a/include/linux/compiler.h~compilerh-fix-barrier_data-on-clang
+++ a/include/linux/compiler.h
@@ -80,11 +80,25 @@ void ftrace_likely_update(struct ftrace_
/* Optimization barrier */
#ifndef barrier
-# define barrier() __memory_barrier()
+/* The "volatile" is due to gcc bugs */
+# define barrier() __asm__ __volatile__("": : :"memory")
#endif
#ifndef barrier_data
-# define barrier_data(ptr) barrier()
+/*
+ * This version is i.e. to prevent dead stores elimination on @ptr
+ * where gcc and llvm may behave differently when otherwise using
+ * normal barrier(): while gcc behavior gets along with a normal
+ * barrier(), llvm needs an explicit input variable to be assumed
+ * clobbered. The issue is as follows: while the inline asm might
+ * access any memory it wants, the compiler could have fit all of
+ * @ptr into memory registers instead, and since @ptr never escaped
+ * from that, it proved that the inline asm wasn't touching any of
+ * it. This version works well with both compilers, i.e. we're telling
+ * the compiler that the inline asm absolutely may see the contents
+ * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
+ */
+# define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
#endif
/* workaround for GCC PR82365 if needed */
_
Patches currently in -mm which might be from nivedita(a)alum.mit.edu are
The patch titled
Subject: mm/gup: use unpin_user_pages() in __gup_longterm_locked()
has been removed from the -mm tree. Its filename was
mm-gup-use-unpin_user_pages-in-__gup_longterm_locked.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Jason Gunthorpe <jgg(a)nvidia.com>
Subject: mm/gup: use unpin_user_pages() in __gup_longterm_locked()
When FOLL_PIN is passed to __get_user_pages() the page list must be put
back using unpin_user_pages() otherwise the page pin reference persists in
a corrupted state.
There are two places in the unwind of __gup_longterm_locked() that put the
pages back without checking. Normally on error this function would return
the partial page list making this the caller's responsibility, but in
these two cases the caller is not allowed to see these pages at all.
Link: https://lkml.kernel.org/r/0-v2-3ae7d9d162e2+2a7-gup_cma_fix_jgg@nvidia.com
Fixes: 3faa52c03f44 ("mm/gup: track FOLL_PIN pages")
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
Reported-by: Ira Weiny <ira.weiny(a)intel.com>
Reviewed-by: Ira Weiny <ira.weiny(a)intel.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
--- a/mm/gup.c~mm-gup-use-unpin_user_pages-in-__gup_longterm_locked
+++ a/mm/gup.c
@@ -1647,8 +1647,11 @@ check_again:
/*
* drop the above get_user_pages reference.
*/
- for (i = 0; i < nr_pages; i++)
- put_page(pages[i]);
+ if (gup_flags & FOLL_PIN)
+ unpin_user_pages(pages, nr_pages);
+ else
+ for (i = 0; i < nr_pages; i++)
+ put_page(pages[i]);
if (migrate_pages(&cma_page_list, alloc_migration_target, NULL,
(unsigned long)&mtc, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
@@ -1728,8 +1731,11 @@ static long __gup_longterm_locked(struct
goto out;
if (check_dax_vmas(vmas_tmp, rc)) {
- for (i = 0; i < rc; i++)
- put_page(pages[i]);
+ if (gup_flags & FOLL_PIN)
+ unpin_user_pages(pages, rc);
+ else
+ for (i = 0; i < rc; i++)
+ put_page(pages[i]);
rc = -EOPNOTSUPP;
goto out;
}
_
Patches currently in -mm which might be from jgg(a)nvidia.com are
mm-reorganize-internal_get_user_pages_fast.patch
mm-prevent-gup_fast-from-racing-with-cow-during-fork.patch
The patch titled
Subject: mm/slub: fix panic in slab_alloc_node()
has been removed from the -mm tree. Its filename was
mm-slub-fix-panic-in-slab_alloc_node.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Laurent Dufour <ldufour(a)linux.ibm.com>
Subject: mm/slub: fix panic in slab_alloc_node()
While doing memory hot-unplug operation on a PowerPC VM running 1024 CPUs
with 11TB of ram, I hit the following panic:
BUG: Kernel NULL pointer dereference on read at 0x00000007
Faulting instruction address: 0xc000000000456048
Oops: Kernel access of bad area, sig: 11 [#2]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS= 2048 NUMA pSeries
Modules linked in: rpadlpar_io rpaphp
CPU: 160 PID: 1 Comm: systemd Tainted: G D 5.9.0 #1
NIP: c000000000456048 LR: c000000000455fd4 CTR: c00000000047b350
REGS: c00006028d1b77a0 TRAP: 0300 Tainted: G D (5.9.0)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24004228 XER: 00000000
CFAR: c00000000000f1b0 DAR: 0000000000000007 DSISR: 40000000 IRQMASK: 0
GPR00: c000000000455fd4 c00006028d1b7a30 c000000001bec800 0000000000000000
GPR04: 0000000000000dc0 0000000000000000 00000000000374ef c00007c53df99320
GPR08: 000007c53c980000 0000000000000000 000007c53c980000 0000000000000000
GPR12: 0000000000004400 c00000001e8e4400 0000000000000000 0000000000000f6a
GPR16: 0000000000000000 c000000001c25930 c000000001d62528 00000000000000c1
GPR20: c000000001d62538 c00006be469e9000 0000000fffffffe0 c0000000003c0ff8
GPR24: 0000000000000018 0000000000000000 0000000000000dc0 0000000000000000
GPR28: c00007c513755700 c000000001c236a4 c00007bc4001f800 0000000000000001
NIP [c000000000456048] __kmalloc_node+0x108/0x790
LR [c000000000455fd4] __kmalloc_node+0x94/0x790
Call Trace:
[c00006028d1b7a30] [c00007c51af92000] 0xc00007c51af92000 (unreliable)
[c00006028d1b7aa0] [c0000000003c0ff8] kvmalloc_node+0x58/0x110
[c00006028d1b7ae0] [c00000000047b45c] mem_cgroup_css_online+0x10c/0x270
[c00006028d1b7b30] [c000000000241fd8] online_css+0x48/0xd0
[c00006028d1b7b60] [c00000000024af14] cgroup_apply_control_enable+0x2c4/0x470
[c00006028d1b7c40] [c00000000024e838] cgroup_mkdir+0x408/0x5f0
[c00006028d1b7cb0] [c0000000005a4ef0] kernfs_iop_mkdir+0x90/0x100
[c00006028d1b7cf0] [c0000000004b8168] vfs_mkdir+0x138/0x250
[c00006028d1b7d40] [c0000000004baf04] do_mkdirat+0x154/0x1c0
[c00006028d1b7dc0] [c000000000032b38] system_call_exception+0xf8/0x200
[c00006028d1b7e20] [c00000000000c740] system_call_common+0xf0/0x27c
Instruction dump:
e93e0000 e90d0030 39290008 7cc9402a e94d0030 e93e0000 7ce95214 7f89502a
2fbc0000 419e0018 41920230 e9270010 <89290007> 7f994800 419e0220 7ee6bb78
This pointing to the following code:
mm/slub.c:2851
if (unlikely(!object || !node_match(page, node))) {
c000000000456038: 00 00 bc 2f cmpdi cr7,r28,0
c00000000045603c: 18 00 9e 41 beq cr7,c000000000456054 <__kmalloc_node+0x114>
node_match():
mm/slub.c:2491
if (node != NUMA_NO_NODE && page_to_nid(page) != node)
c000000000456040: 30 02 92 41 beq cr4,c000000000456270 <__kmalloc_node+0x330>
page_to_nid():
include/linux/mm.h:1294
c000000000456044: 10 00 27 e9 ld r9,16(r7)
c000000000456048: 07 00 29 89 lbz r9,7(r9) <<<< r9 = NULL
node_match():
mm/slub.c:2491
c00000000045604c: 00 48 99 7f cmpw cr7,r25,r9
c000000000456050: 20 02 9e 41 beq cr7,c000000000456270 <__kmalloc_node+0x330>
The panic occurred in slab_alloc_node() when checking for the page's node:
object = c->freelist;
page = c->page;
if (unlikely(!object || !node_match(page, node))) {
object = __slab_alloc(s, gfpflags, node, addr, c);
stat(s, ALLOC_SLOWPATH);
The issue is that object is not NULL while page is NULL which is odd but
may happen if the cache flush happened after loading object but before
loading page. Thus checking for the page pointer is required too.
The cache flush is done through an inter processor interrupt when a piece
of memory is off-lined. That interrupt is triggered when a memory
hot-unplug operation is initiated and offline_pages() is calling the
slub's MEM_GOING_OFFLINE callback slab_mem_going_offline_callback() which
is calling flush_cpu_slab(). If that interrupt is caught between the
reading of c->freelist and the reading of c->page, this could lead to such
a situation. That situation is expected and the later call to
this_cpu_cmpxchg_double() will detect the change to c->freelist and redo
the whole operation.
In commit 6159d0f5c03e ("mm/slub.c: page is always non-NULL in
node_match()") check on the page pointer has been removed assuming that
page is always valid when it is called. It happens that this is not true
in that particular case, so check for page before calling node_match()
here.
Link: https://lkml.kernel.org/r/20201027190406.33283-1-ldufour@linux.ibm.com
Fixes: 6159d0f5c03e ("mm/slub.c: page is always non-NULL in node_match()")
Signed-off-by: Laurent Dufour <ldufour(a)linux.ibm.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Christoph Lameter <cl(a)linux.com>
Cc: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Nathan Lynch <nathanl(a)linux.ibm.com>
Cc: Scott Cheloha <cheloha(a)linux.ibm.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/slub.c~mm-slub-fix-panic-in-slab_alloc_node
+++ a/mm/slub.c
@@ -2852,7 +2852,7 @@ redo:
object = c->freelist;
page = c->page;
- if (unlikely(!object || !node_match(page, node))) {
+ if (unlikely(!object || !page || !node_match(page, node))) {
object = __slab_alloc(s, gfpflags, node, addr, c);
} else {
void *next_object = get_freepointer_safe(s, object);
_
Patches currently in -mm which might be from ldufour(a)linux.ibm.com are
The patch titled
Subject: mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit
has been removed from the -mm tree. Its filename was
mm-vmscan-fix-nr_isolated_file-corruption-on-64-bit.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Nicholas Piggin <npiggin(a)gmail.com>
Subject: mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit
Previously the negated unsigned long would be cast back to signed long
which would have the correct negative value. After commit 730ec8c01a2b
("mm/vmscan.c: change prototype for shrink_page_list"), the large unsigned
int converts to a large positive signed long.
Symptoms include CMA allocations hanging forever holding the cma_mutex due
to alloc_contig_range->...->isolate_migratepages_block waiting forever in
"while (unlikely(too_many_isolated(pgdat)))".
[akpm(a)linux-foundation.org: fix -stat.nr_lazyfree_fail as well, per Michal]
Link: https://lkml.kernel.org/r/20201029032320.1448441-1-npiggin@gmail.com
Fixes: 730ec8c01a2b ("mm/vmscan.c: change prototype for shrink_page_list")
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Vaneet Narang <v.narang(a)samsung.com>
Cc: Maninder Singh <maninder1.s(a)samsung.com>
Cc: Amit Sahrawat <a.sahrawat(a)samsung.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/mm/vmscan.c~mm-vmscan-fix-nr_isolated_file-corruption-on-64-bit
+++ a/mm/vmscan.c
@@ -1516,7 +1516,8 @@ unsigned int reclaim_clean_pages_from_li
nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
TTU_IGNORE_ACCESS, &stat, true);
list_splice(&clean_pages, page_list);
- mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -nr_reclaimed);
+ mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE,
+ -(long)nr_reclaimed);
/*
* Since lazyfree pages are isolated from file LRU from the beginning,
* they will rotate back to anonymous LRU in the end if it failed to
@@ -1526,7 +1527,7 @@ unsigned int reclaim_clean_pages_from_li
mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON,
stat.nr_lazyfree_fail);
mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE,
- -stat.nr_lazyfree_fail);
+ -(long)stat.nr_lazyfree_fail);
return nr_reclaimed;
}
_
Patches currently in -mm which might be from npiggin(a)gmail.com are
The patch titled
Subject: mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate
has been removed from the -mm tree. Its filename was
mm-compaction-stop-isolation-if-too-many-pages-are-isolated-and-we-have-pages-to-migrate.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate
In isolate_migratepages_block, if we have too many isolated pages and
nr_migratepages is not zero, we should try to migrate what we have without
wasting time on isolating.
In theory it's possible that multiple parallel compactions will cause
too_many_isolated() to become true even if each has isolated less than
COMPACT_CLUSTER_MAX, and loop forever in the while loop. Bailing
immediately prevents that.
[vbabka(a)suse.cz: changelog addition]
Link: https://lkml.kernel.org/r/20201030183809.3616803-2-zi.yan@sent.com
Fixes: 1da2f328fa64 (“mm,thp,compaction,cma: allow THP migration for CMA allocations”)
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Suggested-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/compaction.c~mm-compaction-stop-isolation-if-too-many-pages-are-isolated-and-we-have-pages-to-migrate
+++ a/mm/compaction.c
@@ -817,6 +817,10 @@ isolate_migratepages_block(struct compac
* delay for some time until fewer pages are isolated
*/
while (unlikely(too_many_isolated(pgdat))) {
+ /* stop isolation if there are still pages not migrated */
+ if (cc->nr_migratepages)
+ return 0;
+
/* async migration should just abort */
if (cc->mode == MIGRATE_ASYNC)
return 0;
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
The patch titled
Subject: mm/compaction: count pages and stop correctly during page isolation
has been removed from the -mm tree. Its filename was
mm-compaction-count-pages-and-stop-correctly-during-page-isolation.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: mm/compaction: count pages and stop correctly during page isolation
In isolate_migratepages_block, when cc->alloc_contig is true, we are able
to isolate compound pages, nr_migratepages and nr_isolated did not count
compound pages correctly, causing us to isolate more pages than we
thought. Count compound pages as the number of base pages they contain.
Otherwise, we might be trapped in too_many_isolated while loop, since the
actual isolated pages can go up to COMPACT_CLUSTER_MAX*512=16384, where
COMPACT_CLUSTER_MAX is 32, since we stop isolation after
cc->nr_migratepages reaches to COMPACT_CLUSTER_MAX.
In addition, after we fix the issue above, cc->nr_migratepages could never
be equal to COMPACT_CLUSTER_MAX if compound pages are isolated, thus page
isolation could not stop as we intended. Change the isolation stop
condition to >=.
The issue can be triggered as follows: In a system with 16GB memory and an
8GB CMA region reserved by hugetlb_cma, if we first allocate 10GB THPs and
mlock them (so some THPs are allocated in the CMA region and mlocked),
reserving 6 1GB hugetlb pages via
/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages will get stuck
(looping in too_many_isolated function) until we kill either task. With
the patch applied, oom will kill the application with 10GB THPs and let
hugetlb page reservation finish.
[ziy(a)nvidia.com: v3]
Link: https://lkml.kernel.org/r/20201030183809.3616803-1-zi.yan@sent.com
Link: https://lkml.kernel.org/r/20201029200435.3386066-1-zi.yan@sent.com
Fixes: 1da2f328fa64 ("cmm,thp,compaction,cma: allow THP migration for CMA allocations")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/mm/compaction.c~mm-compaction-count-pages-and-stop-correctly-during-page-isolation
+++ a/mm/compaction.c
@@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compac
isolate_success:
list_add(&page->lru, &cc->migratepages);
- cc->nr_migratepages++;
- nr_isolated++;
+ cc->nr_migratepages += compound_nr(page);
+ nr_isolated += compound_nr(page);
/*
* Avoid isolating too much unless this block is being
@@ -1021,7 +1021,7 @@ isolate_success:
* or a lock is contended. For contention, isolate quickly to
* potentially remove one source of contention.
*/
- if (cc->nr_migratepages == COMPACT_CLUSTER_MAX &&
+ if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX &&
!cc->rescan && !cc->contended) {
++low_pfn;
break;
@@ -1132,7 +1132,7 @@ isolate_migratepages_range(struct compac
if (!pfn)
break;
- if (cc->nr_migratepages == COMPACT_CLUSTER_MAX)
+ if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX)
break;
}
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d3938ee23e97bfcac2e0eb6b356875da73d700df Mon Sep 17 00:00:00 2001
From: Gao Xiang <hsiangkao(a)redhat.com>
Date: Sun, 1 Nov 2020 03:51:02 +0800
Subject: [PATCH] erofs: derive atime instead of leaving it empty
EROFS has _only one_ ondisk timestamp (ctime is currently
documented and recorded, we might also record mtime instead
with a new compat feature if needed) for each extended inode
since EROFS isn't mainly for archival purposes so no need to
keep all timestamps on disk especially for Android scenarios
due to security concerns. Also, romfs/cramfs don't have their
own on-disk timestamp, and squashfs only records mtime instead.
Let's also derive access time from ondisk timestamp rather than
leaving it empty, and if mtime/atime for each file are really
needed for specific scenarios as well, we can also use xattrs
to record them then.
Link: https://lore.kernel.org/r/20201031195102.21221-1-hsiangkao@aol.com
[ Gao Xiang: It'd be better to backport for user-friendly concern. ]
Fixes: 431339ba9042 ("staging: erofs: add inode operations")
Cc: stable <stable(a)vger.kernel.org> # 4.19+
Reported-by: nl6720 <nl6720(a)gmail.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Gao Xiang <hsiangkao(a)redhat.com>
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index 139d0bed42f8..3e21c0e8adae 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -107,11 +107,9 @@ static struct page *erofs_read_inode(struct inode *inode,
i_gid_write(inode, le32_to_cpu(die->i_gid));
set_nlink(inode, le32_to_cpu(die->i_nlink));
- /* ns timestamp */
- inode->i_mtime.tv_sec = inode->i_ctime.tv_sec =
- le64_to_cpu(die->i_ctime);
- inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec =
- le32_to_cpu(die->i_ctime_nsec);
+ /* extended inode has its own timestamp */
+ inode->i_ctime.tv_sec = le64_to_cpu(die->i_ctime);
+ inode->i_ctime.tv_nsec = le32_to_cpu(die->i_ctime_nsec);
inode->i_size = le64_to_cpu(die->i_size);
@@ -149,11 +147,9 @@ static struct page *erofs_read_inode(struct inode *inode,
i_gid_write(inode, le16_to_cpu(dic->i_gid));
set_nlink(inode, le16_to_cpu(dic->i_nlink));
- /* use build time to derive all file time */
- inode->i_mtime.tv_sec = inode->i_ctime.tv_sec =
- sbi->build_time;
- inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec =
- sbi->build_time_nsec;
+ /* use build time for compact inodes */
+ inode->i_ctime.tv_sec = sbi->build_time;
+ inode->i_ctime.tv_nsec = sbi->build_time_nsec;
inode->i_size = le32_to_cpu(dic->i_size);
if (erofs_inode_is_data_compressed(vi->datalayout))
@@ -167,6 +163,11 @@ static struct page *erofs_read_inode(struct inode *inode,
goto err_out;
}
+ inode->i_mtime.tv_sec = inode->i_ctime.tv_sec;
+ inode->i_atime.tv_sec = inode->i_ctime.tv_sec;
+ inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec;
+ inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec;
+
if (!nblks)
/* measure inode.i_blocks as generic filesystems */
inode->i_blocks = roundup(inode->i_size, EROFS_BLKSIZ) >> 9;
From: Aili Yao <yaoaili(a)kingsoft.com>
>From commit 6915564dc5a8 ("ACPI: OSL: Change the type of
acpi_os_map_generic_address() return value"),
acpi_os_map_generic_address() will return logical address or NULL for
error, but for ACPI_ADR_SPACE_SYSTEM_IO case, it should be also return 0
as it's a normal case, but now it will return -ENXIO. So check it out for
such case to avoid einj module initialization fail.
Fixes: 6915564dc5a8 ("ACPI: OSL: Change the type of
acpi_os_map_generic_address() return value")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: James Morse <james.morse(a)arm.com>
Tested-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Aili Yao <yaoaili(a)kingsoft.com>
---
drivers/acpi/apei/apei-base.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
index 552fd9f..3294cc8 100644
--- a/drivers/acpi/apei/apei-base.c
+++ b/drivers/acpi/apei/apei-base.c
@@ -633,6 +633,10 @@ int apei_map_generic_address(struct acpi_generic_address *reg)
if (rc)
return rc;
+ /* IO space doesn't need mapping */
+ if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_IO)
+ return 0;
+
if (!acpi_os_map_generic_address(reg))
return -ENXIO;
--
2.9.5
This is a note to let you know that I've just added the patch titled
tty: serial: imx: fix potential deadlock
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 33f16855dcb973f745c51882d0e286601ff3be2b Mon Sep 17 00:00:00 2001
From: Sam Nobs <samuel.nobs(a)taitradio.com>
Date: Tue, 10 Nov 2020 09:50:06 +1300
Subject: tty: serial: imx: fix potential deadlock
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Enabling the lock dependency validator has revealed
that the way spinlocks are used in the IMX serial
port could result in a deadlock.
Specifically, imx_uart_int() acquires a spinlock
without disabling the interrupts, meaning that another
interrupt could come along and try to acquire the same
spinlock, potentially causing the two to wait for each
other indefinitely.
Use spin_lock_irqsave() instead to disable interrupts
upon acquisition of the spinlock.
Fixes: c974991d2620 ("tty:serial:imx: use spin_lock instead of spin_lock_irqsave in isr")
Reviewed-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Signed-off-by: Sam Nobs <samuel.nobs(a)taitradio.com>
Link: https://lore.kernel.org/r/1604955006-9363-1-git-send-email-samuel.nobs@tait…
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/imx.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
index 1731d9728865..3c53a3c89959 100644
--- a/drivers/tty/serial/imx.c
+++ b/drivers/tty/serial/imx.c
@@ -942,8 +942,14 @@ static irqreturn_t imx_uart_int(int irq, void *dev_id)
struct imx_port *sport = dev_id;
unsigned int usr1, usr2, ucr1, ucr2, ucr3, ucr4;
irqreturn_t ret = IRQ_NONE;
+ unsigned long flags = 0;
- spin_lock(&sport->port.lock);
+ /*
+ * IRQs might not be disabled upon entering this interrupt handler,
+ * e.g. when interrupt handlers are forced to be threaded. To support
+ * this scenario as well, disable IRQs when acquiring the spinlock.
+ */
+ spin_lock_irqsave(&sport->port.lock, flags);
usr1 = imx_uart_readl(sport, USR1);
usr2 = imx_uart_readl(sport, USR2);
@@ -1013,7 +1019,7 @@ static irqreturn_t imx_uart_int(int irq, void *dev_id)
ret = IRQ_HANDLED;
}
- spin_unlock(&sport->port.lock);
+ spin_unlock_irqrestore(&sport->port.lock, flags);
return ret;
}
--
2.29.2