November 2020 - Linux-stable-mirror

stable-rc/queue/4.9 build: 2 builds: 0 failed, 2 passed (v4.9.243-42-gaedb439106403)

by kernelci.org bot

stable-rc/queue/4.9 build: 2 builds: 0 failed, 2 passed (v4.9.243-42-gaedb439106403) Full Build Summary: https://kernelci.org/build/stable-rc/branch/queue%2F4.9/kernel/v4.9.243-42-… Tree: stable-rc Branch: queue/4.9 Git Describe: v4.9.243-42-gaedb439106403 Git Commit: aedb439106403bf8967b240e3026d45894489852 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Built: 1 unique architecture ================================================================================ Detailed per-defconfig build reports: -------------------------------------------------------------------------------- exynos_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- imx_v4_v5_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches --- For more info write to <info(a)kernelci.org>

4 years, 11 months

1
0
0 0

[PATCH v2 2/4] Kbuild: do not emit debug info for assembly with LLVM_IAS=1

by Nick Desaulniers

Clang's integrated assembler produces the warning for assembly files: warning: DWARF2 only supports one section per compilation unit If -Wa,-gdwarf-* is unspecified, then debug info is not emitted. This will be re-enabled for new DWARF versions in a follow up patch. Enables defconfig+CONFIG_DEBUG_INFO to build cleanly with LLVM=1 LLVM_IAS=1 for x86_64 and arm64. Cc: <stable(a)vger.kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/716 Reported-by: Nathan Chancellor <natechancellor(a)gmail.com> Suggested-by: Dmitry Golovin <dima(a)golovin.in> Suggested-by: Sedat Dilek <sedat.dilek(a)gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers(a)google.com> --- Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Makefile b/Makefile index f353886dbf44..75b1a3dcbf30 100644 --- a/Makefile +++ b/Makefile @@ -826,7 +826,9 @@ else DEBUG_CFLAGS += -g endif +ifndef LLVM_IAS KBUILD_AFLAGS += -Wa,-gdwarf-2 +endif ifdef CONFIG_DEBUG_INFO_DWARF4 DEBUG_CFLAGS += -gdwarf-4 -- 2.29.1.341.ge80a0c044ae-goog

4 years, 11 months

3
5
0 0

stable-rc/queue/4.19 build: 10 builds: 0 failed, 10 passed, 4 warnings (v4.19.157-62-gdee36feaf4bf)

by kernelci.org bot

stable-rc/queue/4.19 build: 10 builds: 0 failed, 10 passed, 4 warnings (v4.19.157-62-gdee36feaf4bf) Full Build Summary: https://kernelci.org/build/stable-rc/branch/queue%2F4.19/kernel/v4.19.157-6… Tree: stable-rc Branch: queue/4.19 Git Describe: v4.19.157-62-gdee36feaf4bf Git Commit: dee36feaf4bf77a52f6cb2cda611492f7b5049b3 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Built: 2 unique architectures Warnings Detected: arm: colibri_pxa300_defconfig (gcc-8): 1 warning pxa910_defconfig (gcc-8): 1 warning tct_hammer_defconfig (gcc-8): 1 warning mips: gcw0_defconfig (gcc-8): 1 warning Warnings summary: 4 /scratch/linux/drivers/clk/clk.c:49:27: warning: ‘orphan_list’ defined but not used [-Wunused-variable] ================================================================================ Detailed per-defconfig build reports: -------------------------------------------------------------------------------- acs5k_tiny_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- colibri_pxa270_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- colibri_pxa300_defconfig (arm, gcc-8) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: /scratch/linux/drivers/clk/clk.c:49:27: warning: ‘orphan_list’ defined but not used [-Wunused-variable] -------------------------------------------------------------------------------- gcw0_defconfig (mips, gcc-8) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: /scratch/linux/drivers/clk/clk.c:49:27: warning: ‘orphan_list’ defined but not used [-Wunused-variable] -------------------------------------------------------------------------------- mini2440_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- neponset_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- nuc960_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- pxa910_defconfig (arm, gcc-8) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: /scratch/linux/drivers/clk/clk.c:49:27: warning: ‘orphan_list’ defined but not used [-Wunused-variable] -------------------------------------------------------------------------------- realview_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- tct_hammer_defconfig (arm, gcc-8) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: /scratch/linux/drivers/clk/clk.c:49:27: warning: ‘orphan_list’ defined but not used [-Wunused-variable] --- For more info write to <info(a)kernelci.org>

4 years, 11 months

1
0
0 0

+ page_frag-recover-from-memory-pressure.patch added to -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm, page_frag: recover from memory pressure has been added to the -mm tree. Its filename is page_frag-recover-from-memory-pressure.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/page_frag-recover-from-memory-pre… and later at https://ozlabs.org/~akpm/mmotm/broken-out/page_frag-recover-from-memory-pre… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Dongli Zhang <dongli.zhang(a)oracle.com> Subject: mm, page_frag: recover from memory pressure The ethernet driver may allocate skb (and skb->data) via napi_alloc_skb(). This ends up to page_frag_alloc() to allocate skb->data from page_frag_cache->va. During the memory pressure, page_frag_cache->va may be allocated as pfmemalloc page. As a result, the skb->pfmemalloc is always true as skb->data is from page_frag_cache->va. The skb will be dropped if the sock (receiver) does not have SOCK_MEMALLOC. This is expected behaviour under memory pressure. However, once kernel is not under memory pressure any longer (suppose large amount of memory pages are just reclaimed), the page_frag_alloc() may still re-use the prior pfmemalloc page_frag_cache->va to allocate skb->data. As a result, the skb->pfmemalloc is always true unless page_frag_cache->va is re-allocated, even if the kernel is not under memory pressure any longer. Here is how kernel runs into issue. 1. The kernel is under memory pressure and allocation of PAGE_FRAG_CACHE_MAX_ORDER in __page_frag_cache_refill() will fail. Instead, the pfmemalloc page is allocated for page_frag_cache->va. 2. All skb->data from page_frag_cache->va (pfmemalloc) will have skb->pfmemalloc=true. The skb will always be dropped by sock without SOCK_MEMALLOC. This is an expected behaviour. 3. Suppose a large amount of pages are reclaimed and kernel is not under memory pressure any longer. We expect skb->pfmemalloc drop will not happen. 4. Unfortunately, page_frag_alloc() does not proactively re-allocate page_frag_alloc->va and will always re-use the prior pfmemalloc page. The skb->pfmemalloc is always true even kernel is not under memory pressure any longer. Fix this by freeing and re-allocating the page instead of recycling it. Link: https://lore.kernel.org/lkml/20201103193239.1807-1-dongli.zhang@oracle.com/ Link: https://lore.kernel.org/linux-mm/20201105042140.5253-1-willy@infradead.org/ Link: https://lkml.kernel.org/r/20201115201029.11903-1-dongli.zhang@oracle.com Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve") Signed-off-by: Dongli Zhang <dongli.zhang(a)oracle.com> Suggested-by: Matthew Wilcox (Oracle) <willy(a)infradead.org> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Reviewed-by: Eric Dumazet <edumazet(a)google.com> Cc: Aruna Ramakrishna <aruna.ramakrishna(a)oracle.com> Cc: Bert Barbe <bert.barbe(a)oracle.com> Cc: Rama Nichanamatlu <rama.nichanamatlu(a)oracle.com> Cc: Venkat Venkatsubra <venkat.x.venkatsubra(a)oracle.com> Cc: Manjunath Patil <manjunath.b.patil(a)oracle.com> Cc: Joe Jin <joe.jin(a)oracle.com> Cc: SRINIVAS <srinivas.eeda(a)oracle.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 5 +++++ 1 file changed, 5 insertions(+) --- a/mm/page_alloc.c~page_frag-recover-from-memory-pressure +++ a/mm/page_alloc.c @@ -5103,6 +5103,11 @@ refill: if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) goto refill; + if (unlikely(nc->pfmemalloc)) { + free_the_page(page, compound_order(page)); + goto refill; + } + #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) /* if size can vary use size else just use PAGE_SIZE */ size = nc->size; _ Patches currently in -mm which might be from dongli.zhang(a)oracle.com are page_frag-recover-from-memory-pressure.patch

4 years, 11 months

1
0
0 0

stable-rc/queue/5.4 build: 9 builds: 0 failed, 9 passed, 3 warnings (v5.4.77-105-gc72d10024c02)

by kernelci.org bot

stable-rc/queue/5.4 build: 9 builds: 0 failed, 9 passed, 3 warnings (v5.4.77-105-gc72d10024c02) Full Build Summary: https://kernelci.org/build/stable-rc/branch/queue%2F5.4/kernel/v5.4.77-105-… Tree: stable-rc Branch: queue/5.4 Git Describe: v5.4.77-105-gc72d10024c02 Git Commit: c72d10024c02c6cf7bf5684ecc1d1ab2b3ef7908 Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Built: 3 unique architectures Warnings Detected: arm: s5pv210_defconfig (gcc-8): 1 warning sama5_defconfig (gcc-8): 1 warning mips: x86_64: tinyconfig (gcc-8): 1 warning Warnings summary: 2 WARNING: "return_address" [vmlinux] is a static EXPORT_SYMBOL_GPL 1 .config:1156:warning: override: UNWINDER_GUESS changes choice state ================================================================================ Detailed per-defconfig build reports: -------------------------------------------------------------------------------- aspeed_g4_defconfig (arm, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- bmips_be_defconfig (mips, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- malta_defconfig (mips, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- mips_paravirt_defconfig (mips, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- pic32mzda_defconfig (mips, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- pistachio_defconfig (mips, gcc-8) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- s5pv210_defconfig (arm, gcc-8) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: WARNING: "return_address" [vmlinux] is a static EXPORT_SYMBOL_GPL -------------------------------------------------------------------------------- sama5_defconfig (arm, gcc-8) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: WARNING: "return_address" [vmlinux] is a static EXPORT_SYMBOL_GPL -------------------------------------------------------------------------------- tinyconfig (x86_64, gcc-8) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: .config:1156:warning: override: UNWINDER_GUESS changes choice state --- For more info write to <info(a)kernelci.org>

4 years, 11 months

1
0
0 0

[PATCH] drm/i915: Handle max_bpc==16

by Ville Syrjala

From: Ville Syrjälä <ville.syrjala(a)linux.intel.com> EDID can declare the maximum supported bpc up to 16, and apparently there are displays that do so. Currently we assume 12 bpc is tha max. Fix the assumption and toss in a MISSING_CASE() for any other value we don't expect to see. This fixes modesets with a display with EDID max bpc > 12. Previously any modeset would just silently fail on platforms that didn't otherwise limit this via the max_bpc property. In particular we don't add the max_bpc property to HDMI ports on gmch platforms, and thus we would see the raw max_bpc coming from the EDID. I suppose we could already adjust this to also allow 16bpc, but seeing as no current platform supports that there is little point. Cc: stable(a)vger.kernel.org Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2632 Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com> --- drivers/gpu/drm/i915/display/intel_display.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 2729c852c668..2a6eb1ca9c8e 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -13060,10 +13060,11 @@ compute_sink_pipe_bpp(const struct drm_connector_state *conn_state, case 10 ... 11: bpp = 10 * 3; break; - case 12: + case 12 ... 16: bpp = 12 * 3; break; default: + MISSING_CASE(conn_state->max_bpc); return -EINVAL; } -- 2.26.2

4 years, 11 months

2
1
0 0

FAILED: patch "[PATCH] reboot: fix overflow parsing reboot cpu number" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From df5b0ab3e08a156701b537809914b339b0daa526 Mon Sep 17 00:00:00 2001 From: Matteo Croce <mcroce(a)microsoft.com> Date: Fri, 13 Nov 2020 22:52:07 -0800 Subject: [PATCH] reboot: fix overflow parsing reboot cpu number Limit the CPU number to num_possible_cpus(), because setting it to a value lower than INT_MAX but higher than NR_CPUS produces the following error on reboot and shutdown: BUG: unable to handle page fault for address: ffffffff90ab1bb0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1c09067 P4D 1c09067 PUD 1c0a063 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 1 Comm: systemd-shutdow Not tainted 5.9.0-rc8-kvm #110 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 RIP: 0010:migrate_to_reboot_cpu+0xe/0x60 Code: ea ea 00 48 89 fa 48 c7 c7 30 57 f1 81 e9 fa ef ff ff 66 2e 0f 1f 84 00 00 00 00 00 53 8b 1d d5 ea ea 00 e8 14 33 fe ff 89 da <48> 0f a3 15 ea fc bd 00 48 89 d0 73 29 89 c2 c1 e8 06 65 48 8b 3c RSP: 0018:ffffc90000013e08 EFLAGS: 00010246 RAX: ffff88801f0a0000 RBX: 0000000077359400 RCX: 0000000000000000 RDX: 0000000077359400 RSI: 0000000000000002 RDI: ffffffff81c199e0 RBP: ffffffff81c1e3c0 R08: ffff88801f41f000 R09: ffffffff81c1e348 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 00007f32bedf8830 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f32bedf8980(0000) GS:ffff88801f480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff90ab1bb0 CR3: 000000001d057000 CR4: 00000000000006a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __do_sys_reboot.cold+0x34/0x5b do_syscall_64+0x2d/0x40 Fixes: 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic kernel") Signed-off-by: Matteo Croce <mcroce(a)microsoft.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Fabian Frederick <fabf(a)skynet.be> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Guenter Roeck <linux(a)roeck-us.net> Cc: Kees Cook <keescook(a)chromium.org> Cc: Mike Rapoport <rppt(a)kernel.org> Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com> Cc: Petr Mladek <pmladek(a)suse.com> Cc: Robin Holt <robinmholt(a)gmail.com> Cc: <stable(a)vger.kernel.org> Link: https://lkml.kernel.org/r/20201103214025.116799-3-mcroce@linux.microsoft.com Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> diff --git a/kernel/reboot.c b/kernel/reboot.c index 8fbba433725e..af6f23d8bea1 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -558,6 +558,13 @@ static int __init reboot_setup(char *str) reboot_cpu = simple_strtoul(str+3, NULL, 0); else *mode = REBOOT_SOFT; + if (reboot_cpu >= num_possible_cpus()) { + pr_err("Ignoring the CPU number in reboot= option. " + "CPU %d exceeds possible cpu number %d\n", + reboot_cpu, num_possible_cpus()); + reboot_cpu = 0; + break; + } break; case 'g':

4 years, 11 months

2
1
0 0

[merged] ocfs2-initialize-ip_next_orphan.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: ocfs2: initialize ip_next_orphan has been removed from the -mm tree. Its filename was ocfs2-initialize-ip_next_orphan.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Wengang Wang <wen.gang.wang(a)oracle.com> Subject: ocfs2: initialize ip_next_orphan Though problem if found on a lower 4.1.12 kernel, I think upstream has same issue. In one node in the cluster, there is the following callback trace: # cat /proc/21473/stack [<ffffffffc09a2f06>] __ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2] [<ffffffffc09a4481>] ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2] [<ffffffffc09b2ce2>] ocfs2_evict_inode+0x152/0x820 [ocfs2] [<ffffffff8122b36e>] evict+0xae/0x1a0 [<ffffffff8122bd26>] iput+0x1c6/0x230 [<ffffffffc09b60ed>] ocfs2_orphan_filldir+0x5d/0x100 [ocfs2] [<ffffffffc0992ae0>] ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2] [<ffffffffc099a1e9>] ocfs2_dir_foreach+0x29/0x30 [ocfs2] [<ffffffffc09b7716>] ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2] [<ffffffffc09b9b4e>] ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2] [<ffffffff810a1399>] process_one_work+0x169/0x4a0 [<ffffffff810a1bcb>] worker_thread+0x5b/0x560 [<ffffffff810a7a2b>] kthread+0xcb/0xf0 [<ffffffff816f5d21>] ret_from_fork+0x61/0x90 [<ffffffffffffffff>] 0xffffffffffffffff The above stack is not reasonable, the final iput shouldn't happen in ocfs2_orphan_filldir() function. Looking at the code, 2067 /* Skip inodes which are already added to recover list, since dio may 2068 * happen concurrently with unlink/rename */ 2069 if (OCFS2_I(iter)->ip_next_orphan) { 2070 iput(iter); 2071 return 0; 2072 } 2073 The logic thinks the inode is already in recover list on seeing ip_next_orphan is non-NULL, so it skip this inode after dropping a reference which incremented in ocfs2_iget(). While, if the inode is already in recover list, it should have another reference and the iput() at line 2070 should not be the final iput (dropping the last reference). So I don't think the inode is really in the recover list (no vmcore to confirm). Note that ocfs2_queue_orphans(), though not shown up in the call back trace, is holding cluster lock on the orphan directory when looking up for unlinked inodes. The on disk inode eviction could involve a lot of IOs which may need long time to finish. That means this node could hold the cluster lock for very long time, that can lead to the lock requests (from other nodes) to the orhpan directory hang for long time. Looking at more on ip_next_orphan, I found it's not initialized when allocating a new ocfs2_inode_info structure. This causes te reflink operations from some nodes hang for very long time waiting for the cluster lock on the orphan directory. Fix: initialize ip_next_orphan as NULL. Link: https://lkml.kernel.org/r/20201109171746.27884-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang(a)oracle.com> Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com> Cc: Mark Fasheh <mark(a)fasheh.com> Cc: Joel Becker <jlbec(a)evilplan.org> Cc: Junxiao Bi <junxiao.bi(a)oracle.com> Cc: Changwei Ge <gechangwei(a)live.cn> Cc: Gang He <ghe(a)suse.com> Cc: Jun Piao <piaojun(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/ocfs2/super.c | 1 + 1 file changed, 1 insertion(+) --- a/fs/ocfs2/super.c~ocfs2-initialize-ip_next_orphan +++ a/fs/ocfs2/super.c @@ -1713,6 +1713,7 @@ static void ocfs2_inode_init_once(void * oi->ip_blkno = 0ULL; oi->ip_clusters = 0; + oi->ip_next_orphan = NULL; ocfs2_resv_init_once(&oi->ip_la_data_resv); _ Patches currently in -mm which might be from wen.gang.wang(a)oracle.com are

4 years, 11 months

1
0
0 0

[merged] hugetlbfs-fix-anon-huge-page-migration-race.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: hugetlbfs: fix anon huge page migration race has been removed from the -mm tree. Its filename was hugetlbfs-fix-anon-huge-page-migration-race.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Mike Kravetz <mike.kravetz(a)oracle.com> Subject: hugetlbfs: fix anon huge page migration race Qian Cai reported the following BUG in [1] [ 6147.019063][T45242] LTP: starting move_pages12 [ 6147.475680][T64921] BUG: unable to handle page fault for address: ffffffffffffffe0 ... [ 6147.525866][T64921] RIP: 0010:anon_vma_interval_tree_iter_first+0xa2/0x170 avc_start_pgoff at mm/interval_tree.c:63 [ 6147.620914][T64921] Call Trace: [ 6147.624078][T64921] rmap_walk_anon+0x141/0xa30 rmap_walk_anon at mm/rmap.c:1864 [ 6147.628639][T64921] try_to_unmap+0x209/0x2d0 try_to_unmap at mm/rmap.c:1763 [ 6147.633026][T64921] ? rmap_walk_locked+0x140/0x140 [ 6147.637936][T64921] ? page_remove_rmap+0x1190/0x1190 [ 6147.643020][T64921] ? page_not_mapped+0x10/0x10 [ 6147.647668][T64921] ? page_get_anon_vma+0x290/0x290 [ 6147.652664][T64921] ? page_mapcount_is_zero+0x10/0x10 [ 6147.657838][T64921] ? hugetlb_page_mapping_lock_write+0x97/0x180 [ 6147.663972][T64921] migrate_pages+0x1005/0x1fb0 [ 6147.668617][T64921] ? remove_migration_pte+0xac0/0xac0 [ 6147.673875][T64921] move_pages_and_store_status.isra.47+0xd7/0x1a0 [ 6147.680181][T64921] ? migrate_pages+0x1fb0/0x1fb0 [ 6147.685002][T64921] __x64_sys_move_pages+0xa5c/0x1100 [ 6147.690176][T64921] ? trace_hardirqs_on+0x20/0x1b5 [ 6147.695084][T64921] ? move_pages_and_store_status.isra.47+0x1a0/0x1a0 [ 6147.701653][T64921] ? rcu_read_lock_sched_held+0xaa/0xd0 [ 6147.707088][T64921] ? switch_fpu_return+0x196/0x400 [ 6147.712083][T64921] ? lockdep_hardirqs_on_prepare+0x38c/0x550 [ 6147.717954][T64921] ? do_syscall_64+0x24/0x310 [ 6147.722513][T64921] do_syscall_64+0x5f/0x310 [ 6147.726897][T64921] ? trace_hardirqs_off+0x12/0x1a0 [ 6147.731894][T64921] ? asm_exc_page_fault+0x8/0x30 [ 6147.736714][T64921] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Hugh Dickens diagnosed this as a migration bug caused by code introduced to use i_mmap_rwsem for pmd sharing synchronization. Specifically, the routine unmap_and_move_huge_page() is always passing the TTU_RMAP_LOCKED flag to try_to_unmap() while holding i_mmap_rwsem. This is wrong for anon pages as the anon_vma_lock should be held in this case. Further analysis suggested that i_mmap_rwsem was not required to he held at all when calling try_to_unmap for anon pages as an anon page could never be part of a shared pmd mapping. Discussion also revealed that the hack in hugetlb_page_mapping_lock_write to drop page lock and acquire i_mmap_rwsem is wrong. There is no way to keep mapping valid while dropping page lock. This patch does the following: - Do not take i_mmap_rwsem and set TTU_RMAP_LOCKED for anon pages when calling try_to_unmap. - Remove the hacky code in hugetlb_page_mapping_lock_write. The routine will now simply do a 'trylock' while still holding the page lock. If the trylock fails, it will return NULL. This could impact the callers: - migration calling code will receive -EAGAIN and retry up to the hard coded limit (10). - memory error code will treat the page as BUSY. This will force killing (SIGKILL) instead of SIGBUS any mapping tasks. Do note that this change in behavior only happens when there is a race. None of the standard kernel testing suites actually hit this race, but it is possible. [1] https://lore.kernel.org/lkml/20200708012044.GC992@lca.pw/ [2] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.a… Link: https://lkml.kernel.org/r/20201105195058.78401-1-mike.kravetz@oracle.com Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Reported-by: Qian Cai <cai(a)lca.pw> Suggested-by: Hugh Dickins <hughd(a)google.com> Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/hugetlb.c | 90 ++---------------------------------------- mm/memory-failure.c | 36 +++++++--------- mm/migrate.c | 46 +++++++++++---------- mm/rmap.c | 5 -- 4 files changed, 48 insertions(+), 129 deletions(-) --- a/mm/hugetlb.c~hugetlbfs-fix-anon-huge-page-migration-race +++ a/mm/hugetlb.c @@ -1568,103 +1568,23 @@ int PageHeadHuge(struct page *page_head) } /* - * Find address_space associated with hugetlbfs page. - * Upon entry page is locked and page 'was' mapped although mapped state - * could change. If necessary, use anon_vma to find vma and associated - * address space. The returned mapping may be stale, but it can not be - * invalid as page lock (which is held) is required to destroy mapping. - */ -static struct address_space *_get_hugetlb_page_mapping(struct page *hpage) -{ - struct anon_vma *anon_vma; - pgoff_t pgoff_start, pgoff_end; - struct anon_vma_chain *avc; - struct address_space *mapping = page_mapping(hpage); - - /* Simple file based mapping */ - if (mapping) - return mapping; - - /* - * Even anonymous hugetlbfs mappings are associated with an - * underlying hugetlbfs file (see hugetlb_file_setup in mmap - * code). Find a vma associated with the anonymous vma, and - * use the file pointer to get address_space. - */ - anon_vma = page_lock_anon_vma_read(hpage); - if (!anon_vma) - return mapping; /* NULL */ - - /* Use first found vma */ - pgoff_start = page_to_pgoff(hpage); - pgoff_end = pgoff_start + pages_per_huge_page(page_hstate(hpage)) - 1; - anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, - pgoff_start, pgoff_end) { - struct vm_area_struct *vma = avc->vma; - - mapping = vma->vm_file->f_mapping; - break; - } - - anon_vma_unlock_read(anon_vma); - return mapping; -} - -/* * Find and lock address space (mapping) in write mode. * - * Upon entry, the page is locked which allows us to find the mapping - * even in the case of an anon page. However, locking order dictates - * the i_mmap_rwsem be acquired BEFORE the page lock. This is hugetlbfs - * specific. So, we first try to lock the sema while still holding the - * page lock. If this works, great! If not, then we need to drop the - * page lock and then acquire i_mmap_rwsem and reacquire page lock. Of - * course, need to revalidate state along the way. + * Upon entry, the page is locked which means that page_mapping() is + * stable. Due to locking order, we can only trylock_write. If we can + * not get the lock, simply return NULL to caller. */ struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage) { - struct address_space *mapping, *mapping2; + struct address_space *mapping = page_mapping(hpage); - mapping = _get_hugetlb_page_mapping(hpage); -retry: if (!mapping) return mapping; - /* - * If no contention, take lock and return - */ if (i_mmap_trylock_write(mapping)) return mapping; - /* - * Must drop page lock and wait on mapping sema. - * Note: Once page lock is dropped, mapping could become invalid. - * As a hack, increase map count until we lock page again. - */ - atomic_inc(&hpage->_mapcount); - unlock_page(hpage); - i_mmap_lock_write(mapping); - lock_page(hpage); - atomic_add_negative(-1, &hpage->_mapcount); - - /* verify page is still mapped */ - if (!page_mapped(hpage)) { - i_mmap_unlock_write(mapping); - return NULL; - } - - /* - * Get address space again and verify it is the same one - * we locked. If not, drop lock and retry. - */ - mapping2 = _get_hugetlb_page_mapping(hpage); - if (mapping2 != mapping) { - i_mmap_unlock_write(mapping); - mapping = mapping2; - goto retry; - } - - return mapping; + return NULL; } pgoff_t __basepage_index(struct page *page) --- a/mm/memory-failure.c~hugetlbfs-fix-anon-huge-page-migration-race +++ a/mm/memory-failure.c @@ -1057,27 +1057,25 @@ static bool hwpoison_user_mappings(struc if (!PageHuge(hpage)) { unmap_success = try_to_unmap(hpage, ttu); } else { - /* - * For hugetlb pages, try_to_unmap could potentially call - * huge_pmd_unshare. Because of this, take semaphore in - * write mode here and set TTU_RMAP_LOCKED to indicate we - * have taken the lock at this higer level. - * - * Note that the call to hugetlb_page_mapping_lock_write - * is necessary even if mapping is already set. It handles - * ugliness of potentially having to drop page lock to obtain - * i_mmap_rwsem. - */ - mapping = hugetlb_page_mapping_lock_write(hpage); - - if (mapping) { - unmap_success = try_to_unmap(hpage, + if (!PageAnon(hpage)) { + /* + * For hugetlb pages in shared mappings, try_to_unmap + * could potentially call huge_pmd_unshare. Because of + * this, take semaphore in write mode here and set + * TTU_RMAP_LOCKED to indicate we have taken the lock + * at this higer level. + */ + mapping = hugetlb_page_mapping_lock_write(hpage); + if (mapping) { + unmap_success = try_to_unmap(hpage, ttu|TTU_RMAP_LOCKED); - i_mmap_unlock_write(mapping); + i_mmap_unlock_write(mapping); + } else { + pr_info("Memory failure: %#lx: could not lock mapping for mapped huge page\n", pfn); + unmap_success = false; + } } else { - pr_info("Memory failure: %#lx: could not find mapping for mapped huge page\n", - pfn); - unmap_success = false; + unmap_success = try_to_unmap(hpage, ttu); } } if (!unmap_success) --- a/mm/migrate.c~hugetlbfs-fix-anon-huge-page-migration-race +++ a/mm/migrate.c @@ -1328,34 +1328,38 @@ static int unmap_and_move_huge_page(new_ goto put_anon; if (page_mapped(hpage)) { - /* - * try_to_unmap could potentially call huge_pmd_unshare. - * Because of this, take semaphore in write mode here and - * set TTU_RMAP_LOCKED to let lower levels know we have - * taken the lock. - */ - mapping = hugetlb_page_mapping_lock_write(hpage); - if (unlikely(!mapping)) - goto unlock_put_anon; - - try_to_unmap(hpage, - TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS| - TTU_RMAP_LOCKED); + bool mapping_locked = false; + enum ttu_flags ttu = TTU_MIGRATION|TTU_IGNORE_MLOCK| + TTU_IGNORE_ACCESS; + + if (!PageAnon(hpage)) { + /* + * In shared mappings, try_to_unmap could potentially + * call huge_pmd_unshare. Because of this, take + * semaphore in write mode here and set TTU_RMAP_LOCKED + * to let lower levels know we have taken the lock. + */ + mapping = hugetlb_page_mapping_lock_write(hpage); + if (unlikely(!mapping)) + goto unlock_put_anon; + + mapping_locked = true; + ttu |= TTU_RMAP_LOCKED; + } + + try_to_unmap(hpage, ttu); page_was_mapped = 1; - /* - * Leave mapping locked until after subsequent call to - * remove_migration_ptes() - */ + + if (mapping_locked) + i_mmap_unlock_write(mapping); } if (!page_mapped(hpage)) rc = move_to_new_page(new_hpage, hpage, mode); - if (page_was_mapped) { + if (page_was_mapped) remove_migration_ptes(hpage, - rc == MIGRATEPAGE_SUCCESS ? new_hpage : hpage, true); - i_mmap_unlock_write(mapping); - } + rc == MIGRATEPAGE_SUCCESS ? new_hpage : hpage, false); unlock_put_anon: unlock_page(new_hpage); --- a/mm/rmap.c~hugetlbfs-fix-anon-huge-page-migration-race +++ a/mm/rmap.c @@ -1413,9 +1413,6 @@ static bool try_to_unmap_one(struct page /* * If sharing is possible, start and end will be adjusted * accordingly. - * - * If called for a huge page, caller must hold i_mmap_rwsem - * in write mode as it is possible to call huge_pmd_unshare. */ adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end); @@ -1462,7 +1459,7 @@ static bool try_to_unmap_one(struct page subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); address = pvmw.address; - if (PageHuge(page)) { + if (PageHuge(page) && !PageAnon(page)) { /* * To call huge_pmd_unshare, i_mmap_rwsem must be * held in write mode. Caller needs to explicitly _ Patches currently in -mm which might be from mike.kravetz(a)oracle.com are

4 years, 11 months

1
0
0 0

[merged] reboot-fix-overflow-parsing-reboot-cpu-number.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: reboot: fix overflow parsing reboot cpu number has been removed from the -mm tree. Its filename was reboot-fix-overflow-parsing-reboot-cpu-number.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Matteo Croce <mcroce(a)microsoft.com> Subject: reboot: fix overflow parsing reboot cpu number Limit the CPU number to num_possible_cpus(), because setting it to a value lower than INT_MAX but higher than NR_CPUS produces the following error on reboot and shutdown: BUG: unable to handle page fault for address: ffffffff90ab1bb0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1c09067 P4D 1c09067 PUD 1c0a063 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 1 Comm: systemd-shutdow Not tainted 5.9.0-rc8-kvm #110 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 RIP: 0010:migrate_to_reboot_cpu+0xe/0x60 Code: ea ea 00 48 89 fa 48 c7 c7 30 57 f1 81 e9 fa ef ff ff 66 2e 0f 1f 84 00 00 00 00 00 53 8b 1d d5 ea ea 00 e8 14 33 fe ff 89 da <48> 0f a3 15 ea fc bd 00 48 89 d0 73 29 89 c2 c1 e8 06 65 48 8b 3c RSP: 0018:ffffc90000013e08 EFLAGS: 00010246 RAX: ffff88801f0a0000 RBX: 0000000077359400 RCX: 0000000000000000 RDX: 0000000077359400 RSI: 0000000000000002 RDI: ffffffff81c199e0 RBP: ffffffff81c1e3c0 R08: ffff88801f41f000 R09: ffffffff81c1e348 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 00007f32bedf8830 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f32bedf8980(0000) GS:ffff88801f480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff90ab1bb0 CR3: 000000001d057000 CR4: 00000000000006a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __do_sys_reboot.cold+0x34/0x5b ? vfs_writev+0x92/0xc0 ? do_writev+0x52/0xd0 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f32bfaaecd3 Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 89 81 0c 00 f7 d8 RSP: 002b:00007fff6265fb58 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f32bfaaecd3 RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead RBP: 0000000000000000 R08: 0000000000008020 R09: 00007fff6265ef60 R10: 00007f32bedf8830 R11: 0000000000000202 R12: 0000000000000000 R13: 0000557bba2c51c0 R14: 0000000000000000 R15: 00007fff6265fbc8 CR2: ffffffff90ab1bb0 ---[ end trace b813e80157136563 ]--- RIP: 0010:migrate_to_reboot_cpu+0xe/0x60 Code: ea ea 00 48 89 fa 48 c7 c7 30 57 f1 81 e9 fa ef ff ff 66 2e 0f 1f 84 00 00 00 00 00 53 8b 1d d5 ea ea 00 e8 14 33 fe ff 89 da <48> 0f a3 15 ea fc bd 00 48 89 d0 73 29 89 c2 c1 e8 06 65 48 8b 3c RSP: 0018:ffffc90000013e08 EFLAGS: 00010246 RAX: ffff88801f0a0000 RBX: 0000000077359400 RCX: 0000000000000000 RDX: 0000000077359400 RSI: 0000000000000002 RDI: ffffffff81c199e0 RBP: ffffffff81c1e3c0 R08: ffff88801f41f000 R09: ffffffff81c1e348 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 00007f32bedf8830 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f32bedf8980(0000) GS:ffff88801f480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff90ab1bb0 CR3: 000000001d057000 CR4: 00000000000006a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 Kernel Offset: disabled ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]--- Link: https://lkml.kernel.org/r/20201103214025.116799-3-mcroce@linux.microsoft.com Fixes: 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic kernel") Signed-off-by: Matteo Croce <mcroce(a)microsoft.com> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Fabian Frederick <fabf(a)skynet.be> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Guenter Roeck <linux(a)roeck-us.net> Cc: Kees Cook <keescook(a)chromium.org> Cc: Mike Rapoport <rppt(a)kernel.org> Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com> Cc: Petr Mladek <pmladek(a)suse.com> Cc: Robin Holt <robinmholt(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/reboot.c | 7 +++++++ 1 file changed, 7 insertions(+) --- a/kernel/reboot.c~reboot-fix-overflow-parsing-reboot-cpu-number +++ a/kernel/reboot.c @@ -558,6 +558,13 @@ static int __init reboot_setup(char *str reboot_cpu = simple_strtoul(str+3, NULL, 0); else *mode = REBOOT_SOFT; + if (reboot_cpu >= num_possible_cpus()) { + pr_err("Ignoring the CPU number in reboot= option. " + "CPU %d exceeds possible cpu number %d\n", + reboot_cpu, num_possible_cpus()); + reboot_cpu = 0; + break; + } break; case 'g': _ Patches currently in -mm which might be from mcroce(a)microsoft.com are reboot-refactor-and-comment-the-cpu-selection-code.patch reboot-allow-to-specify-reboot-mode-via-sysfs.patch reboot-remove-cf9_safe-from-allowed-types-and-rename-cf9_force.patch

4 years, 11 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror November 2020