February 2019 - Linux-stable-mirror

[PATCH] stable-kernel-rules.rst: add link to networking patch queue

by Greg Kroah-Hartman

The networking maintainer keeps a public list of the patches being queued up for the next round of stable releases. Be sure to check there before asking for a patch to be applied so that you do not waste people's time. Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Acked-by: David S. Miller <davem(a)davemloft.net> --- Jon, I can take this through one of my trees if you don't want to, which ever is easier for you. diff --git a/Documentation/process/stable-kernel-rules.rst b/Documentation/process/stable-kernel-rules.rst index 0de6f6145cc6..7ba8cd567f84 100644 --- a/Documentation/process/stable-kernel-rules.rst +++ b/Documentation/process/stable-kernel-rules.rst @@ -38,6 +38,9 @@ Procedure for submitting patches to the -stable tree - If the patch covers files in net/ or drivers/net please follow netdev stable submission guidelines as described in :ref:`Documentation/networking/netdev-FAQ.rst <netdev-FAQ>` + after first checking the stable networking queue at + https://patchwork.ozlabs.org/bundle/davem/stable/?series=&submitter=&state=… + to ensure the requested patch is not already queued up. - Security patches should not be handled (solely) by the -stable review process but should follow the procedures in :ref:`Documentation/admin-guide/security-bugs.rst <securitybugs>`.

6 years, 9 months

2
1
0 0

Re: [PATCH] huegtlbfs: fix page leak during migration of file pages

by Mike Kravetz

On 1/31/19 6:12 AM, Sasha Levin wrote: > Hi, > > [This is an automated email] > > This commit has been processed because it contains a "Fixes:" tag, > fixing commit: 290408d4a250 hugetlb: hugepage migration core. > > The bot has tested the following trees: v4.20.5, v4.19.18, v4.14.96, v4.9.153, v4.4.172, v3.18.133. > > v4.20.5: Build OK! > v4.19.18: Build OK! > v4.14.96: Build OK! > v4.9.153: Failed to apply! Possible dependencies: > 2916ecc0f9d4 ("mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY") > > v4.4.172: Failed to apply! Possible dependencies: > 09cbfeaf1a5a ("mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros") > 0e749e54244e ("dax: increase granularity of dax_clear_blocks() operations") > 2916ecc0f9d4 ("mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY") > 2a28900be206 ("udf: Export superblock magic to userspace") > 4420cfd3f51c ("staging: lustre: format properly all comment blocks for LNet core") > 48b4800a1c6a ("zsmalloc: page migration support") > 5057dcd0f1aa ("virtio_balloon: export 'available' memory to balloon statistics") > 52db400fcd50 ("pmem, dax: clean up clear_pmem()") > 5b7a487cf32d ("f2fs: add customized migrate_page callback") > 5fd88337d209 ("staging: lustre: fix all conditional comparison to zero in LNet layer") > a188222b6ed2 ("net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK") > b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature") > b2e0d1625e19 ("dax: fix lifetime of in-kernel dax mappings with dax_map_atomic()") > bda807d44454 ("mm: migrate: support non-lru movable page migration") > c8b8e32d700f ("direct-io: eliminate the offset argument to ->direct_IO") > d1a5f2b4d8a1 ("block: use DAX for partition table reads") > e10624f8c097 ("pmem: fail io-requests to known bad blocks") > > v3.18.133: Failed to apply! Possible dependencies: > 0722b1011a5f ("f2fs: set page private for inmemory pages for truncation") > 1601839e9e5b ("f2fs: fix to release count of meta page in ->invalidatepage") > 2916ecc0f9d4 ("mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY") > 31a3268839c1 ("f2fs: cleanup if-statement of phase in gc_data_segment") > 34ba94bac938 ("f2fs: do not make dirty any inmemory pages") > 34d67debe02b ("f2fs: add infra struct and helper for inline dir") > 4634d71ed190 ("f2fs: fix missing kmem_cache_free") > 487261f39bcd ("f2fs: merge {invalidate,release}page for meta/node/data pages") > 5b7a487cf32d ("f2fs: add customized migrate_page callback") > 67298804f344 ("f2fs: introduce struct inode_management to wrap inner fields") > 769ec6e5b7d4 ("f2fs: call radix_tree_preload before radix_tree_insert") > 7dda2af83b2b ("f2fs: more fast lookup for gc_inode list") > 8b26ef98da33 ("f2fs: use rw_semaphore for nat entry lock") > 8c402946f074 ("f2fs: introduce the number of inode entries") > 9be32d72becc ("f2fs: do retry operations with cond_resched") > 9e4ded3f309e ("f2fs: activate f2fs_trace_pid") > d5053a34a9cc ("f2fs: introduce -o fastboot for reducing booting time only") > e5e7ea3c86e5 ("f2fs: control the memory footprint used by ino entries") > f68daeebba5a ("f2fs: keep PagePrivate during releasepage") > > > How should we proceed with this patch? Hello automated Sasha, First, let's wait for review/ack. However, the patch does not strictly 'depend' on the functionality of the commits in the lists above. If/when this goes upstream I can provide backports for 4.9, 4.4 and 3.18. -- Mike Kravetz

6 years, 9 months

1
0
0 0

[patch 21/24] mm: migrate: don't rely on __PageMovable() of newpage after unlocking it

by akpm＠linux-foundation.org

From: David Hildenbrand <david(a)redhat.com> Subject: mm: migrate: don't rely on __PageMovable() of newpage after unlocking it We had a race in the old balloon compaction code before b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature") refactored it that became visible after backporting 195a8c43e93d ("virtio-balloon: deflate via a page list") without the refactoring. The bug existed from commit d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") till b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature"). d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") was backported to 3.12, so the broken kernels are stable kernels [3.12 - 4.7]. There was a subtle race between dropping the page lock of the newpage in __unmap_and_move() and checking for __is_movable_balloon_page(newpage). Just after dropping this page lock, virtio-balloon could go ahead and deflate the newpage, effectively dequeueing it and clearing PageBalloon, in turn making __is_movable_balloon_page(newpage) fail. This resulted in dropping the reference of the newpage via putback_lru_page(newpage) instead of put_page(newpage), leading to page->lru getting modified and a !LRU page ending up in the LRU lists. With 195a8c43e93d ("virtio-balloon: deflate via a page list") backported, one would suddenly get corrupted lists in release_pages_balloon(): - WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0 - list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100 Nowadays this race is no longer possible, but it is hidden behind very ugly handling of __ClearPageMovable() and __PageMovable(). __ClearPageMovable() will not make __PageMovable() fail, only PageMovable(). So the new check (__PageMovable(newpage)) will still hold even after newpage was dequeued by virtio-balloon. If anybody would ever change that special handling, the BUG would be introduced again. So instead, make it explicit and use the information of the original isolated page before migration. This patch can be backported fairly easy to stable kernels (in contrast to the refactoring). Link: http://lkml.kernel.org/r/20190129233217.10747-1-david@redhat.com Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") Signed-off-by: David Hildenbrand <david(a)redhat.com> Reported-by: Vratislav Bendel <vbendel(a)redhat.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Acked-by: Rafael Aquini <aquini(a)redhat.com> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Cc: Jan Kara <jack(a)suse.cz> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Dominik Brodowski <linux(a)dominikbrodowski.net> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Vratislav Bendel <vbendel(a)redhat.com> Cc: Rafael Aquini <aquini(a)redhat.com> Cc: Konstantin Khlebnikov <k.khlebnikov(a)samsung.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: <stable(a)vger.kernel.org> [3.12 - 4.7] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/migrate.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) --- a/mm/migrate.c~mm-migrate-dont-rely-on-__pagemovable-of-newpage-after-unlocking-it +++ a/mm/migrate.c @@ -1130,10 +1130,13 @@ out: * If migration is successful, decrease refcount of the newpage * which will not free the page because new page owner increased * refcounter. As well, if it is LRU page, add the page to LRU - * list in here. + * list in here. Use the old state of the isolated source page to + * determine if we migrated a LRU page. newpage was already unlocked + * and possibly modified by its owner - don't rely on the page + * state. */ if (rc == MIGRATEPAGE_SUCCESS) { - if (unlikely(__PageMovable(newpage))) + if (unlikely(!is_lru)) put_page(newpage); else putback_lru_page(newpage); _

6 years, 9 months

1
0
0 0

[patch 18/24] mm: hwpoison: use do_send_sig_info() instead of force_sig()

by akpm＠linux-foundation.org

From: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Subject: mm: hwpoison: use do_send_sig_info() instead of force_sig() Currently memory_failure() is racy against process's exiting, which results in kernel crash by null pointer dereference. The root cause is that memory_failure() uses force_sig() to forcibly kill asynchronous (meaning not in the current context) processes. As discussed in thread https://lkml.org/lkml/2010/6/8/236 years ago for OOM fixes, this is not a right thing to do. OOM solves this issue by using do_send_sig_info() as done in commit d2d393099de2 ("signal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig()"), so this patch is suggesting to do the same for hwpoison. do_send_sig_info() properly accesses to siglock with lock_task_sighand(), so is free from the reported race. I confirmed that the reported bug reproduces with inserting some delay in kill_procs(), and it never reproduces with this patch. Note that memory_failure() can send another type of signal using force_sig_mceerr(), and the reported race shouldn't happen on it because force_sig_mceerr() is called only for synchronous processes (i.e. BUS_MCEERR_AR happens only when some process accesses to the corrupted memory.) Link: http://lkml.kernel.org/r/20190116093046.GA29835@hori1.linux.bs1.fc.nec.co.jp Signed-off-by: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Reported-by: Jane Chu <jane.chu(a)oracle.com> Reviewed-by: Dan Williams <dan.j.williams(a)intel.com> Reviewed-by: William Kucharski <william.kucharski(a)oracle.com> Cc: Oleg Nesterov <oleg(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory-failure.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/mm/memory-failure.c~mm-hwpoison-use-do_send_sig_info-instead-of-force_sig-re-pmem-error-handling-forces-sigkill-causes-kernel-panic +++ a/mm/memory-failure.c @@ -372,7 +372,8 @@ static void kill_procs(struct list_head if (fail || tk->addr_valid == 0) { pr_err("Memory failure: %#lx: forcibly killing %s:%d because of failure to unmap corrupted page\n", pfn, tk->tsk->comm, tk->tsk->pid); - force_sig(SIGKILL, tk->tsk); + do_send_sig_info(SIGKILL, SEND_SIG_PRIV, + tk->tsk, PIDTYPE_PID); } /* _

6 years, 9 months

1
0
0 0

[patch 14/24] mm, oom: fix use-after-free in oom_kill_process

by akpm＠linux-foundation.org

From: Shakeel Butt <shakeelb(a)google.com> Subject: mm, oom: fix use-after-free in oom_kill_process Syzbot instance running on upstream kernel found a use-after-free bug in oom_kill_process. On further inspection it seems like the process selected to be oom-killed has exited even before reaching read_lock(&tasklist_lock) in oom_kill_process(). More specifically the tsk->usage is 1 which is due to get_task_struct() in oom_evaluate_task() and the put_task_struct within for_each_thread() frees the tsk and for_each_thread() tries to access the tsk. The easiest fix is to do get/put across the for_each_thread() on the selected task. Now the next question is should we continue with the oom-kill as the previously selected task has exited? However before adding more complexity and heuristics, let's answer why we even look at the children of oom-kill selected task? The select_bad_process() has already selected the worst process in the system/memcg. Due to race, the selected process might not be the worst at the kill time but does that matter? The userspace can use the oom_score_adj interface to prefer children to be killed before the parent. I looked at the history but it seems like this is there before git history. Link: http://lkml.kernel.org/r/20190121215850.221745-1-shakeelb@google.com Reported-by: syzbot+7fbbfa368521945f0e3d(a)syzkaller.appspotmail.com Fixes: 6b0c81b3be11 ("mm, oom: reduce dependency on tasklist_lock") Signed-off-by: Shakeel Butt <shakeelb(a)google.com> Reviewed-by: Roman Gushchin <guro(a)fb.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/oom_kill.c | 8 ++++++++ 1 file changed, 8 insertions(+) --- a/mm/oom_kill.c~mm-oom-fix-use-after-free-in-oom_kill_process +++ a/mm/oom_kill.c @@ -975,6 +975,13 @@ static void oom_kill_process(struct oom_ * still freeing memory. */ read_lock(&tasklist_lock); + + /* + * The task 'p' might have already exited before reaching here. The + * put_task_struct() will free task_struct 'p' while the loop still try + * to access the field of 'p', so, get an extra reference. + */ + get_task_struct(p); for_each_thread(p, t) { list_for_each_entry(child, &t->children, sibling) { unsigned int child_points; @@ -994,6 +1001,7 @@ static void oom_kill_process(struct oom_ } } } + put_task_struct(p); read_unlock(&tasklist_lock); /* _

6 years, 9 months

1
0
0 0

[patch 12/24] mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages

by akpm＠linux-foundation.org

From: Oscar Salvador <osalvador(a)suse.de> Subject: mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages This is the same sort of error we saw in 17e2e7d7e1b83 ("mm, page_alloc: fix has_unmovable_pages for HugePages"). Gigantic hugepages cross several memblocks, so it can be that the page we get in scan_movable_pages() is a page-tail belonging to a 1G-hugepage. If that happens, page_hstate()->size_to_hstate() will return NULL, and we will blow up in hugepage_migration_supported(). The splat is as follows: kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 kernel: #PF error: [normal kernel read fault] kernel: PGD 0 P4D 0 kernel: Oops: 0000 [#1] SMP PTI kernel: CPU: 1 PID: 1350 Comm: bash Tainted: G E 5.0.0-rc1-mm1-1-default+ #27 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 kernel: RIP: 0010:__offline_pages+0x6ae/0x900 kernel: Code: 48 c7 c6 d0 3e a4 81 e8 44 c8 ad ff 49 8b 04 24 bf 00 10 00 00 a9 00 00 01 00 74 09 41 0f b6 4c 24 51 48 d3 e7 e8 42 2a c1 ff <8b> 40 08 83 f8 09 0f 84 b0 fc ff ff 83 f8 12 0f 84 a7 fc ff ff 83 kernel: RSP: 0018:ffffc900008e3d20 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffffea0000000000 RCX: 0000000000000009 kernel: RDX: ffffffff825c64f0 RSI: 0000000000001000 RDI: 0000000000001000 kernel: RBP: ffffc900008e3d68 R08: 0000000000200000 R09: 00000000000001e4 kernel: R10: 0000000000000058 R11: ffffffff8254a854 R12: ffffea0004200000 kernel: R13: 0000000000108000 R14: 0000000000110000 R15: 0000000000000000 kernel: FS: 00007ff172339b80(0000) GS:ffff88803eb00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000008 CR3: 0000000038d78006 CR4: 00000000003606a0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: Call Trace: kernel: ? klist_next+0x79/0xe0 kernel: memory_subsys_offline+0x42/0x60 kernel: device_offline+0x80/0xa0 kernel: state_store+0xab/0xc0 kernel: kernfs_fop_write+0x102/0x180 kernel: __vfs_write+0x26/0x190 kernel: ? set_close_on_exec+0x49/0x70 kernel: vfs_write+0xad/0x1b0 kernel: ksys_write+0x42/0x90 kernel: do_syscall_64+0x5b/0x180 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 kernel: RIP: 0033:0x7ff1719febe4 kernel: Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 4a fc 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83 kernel: RSP: 002b:00007ffd50b7ddc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 kernel: RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007ff1719febe4 kernel: RDX: 0000000000000008 RSI: 00005556e9216b20 RDI: 0000000000000001 kernel: RBP: 00005556e9216b20 R08: 000000000000000a R09: 0000000000000000 kernel: R10: 000000000000000a R11: 0000000000000246 R12: 0000000000000008 kernel: R13: 0000000000000001 R14: 00007ff171cca720 R15: 0000000000000008 kernel: Modules linked in: af_packet(E) xt_tcpudp(E) ipt_REJECT(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv4(E) ip_set(E) nfnetlink(E) ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ebtable_filter(E) ebtables(E) iptable_filter(E) ip_tables(E) x_tables(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) bochs_drm(E) ttm(E) aesni_intel(E) drm_kms_helper(E) aes_x86_64(E) crypto_simd(E) cryptd(E) glue_helper(E) drm(E) virtio_net(E) syscopyarea(E) sysfillrect(E) net_failover(E) sysimgblt(E) pcspkr(E) failover(E) i2c_piix4(E) fb_sys_fops(E) parport_pc(E) parport(E) button(E) btrfs(E) libcrc32c(E) xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E) raid6_pq(E) sd_mod(E) ata_generic(E) ata_piix(E) ahci(E) libahci(E) libata(E) crc32c_intel(E) serio_raw(E) virtio_pci(E) virtio_ring(E) virtio(E) sg(E) scsi_mod(E) autofs4(E) kernel: CR2: 0000000000000008 kernel: ---[ end trace bdb71590872849fb ]--- kernel: RIP: 0010:__offline_pages+0x6ae/0x900 kernel: Code: 48 c7 c6 d0 3e a4 81 e8 44 c8 ad ff 49 8b 04 24 bf 00 10 00 00 a9 00 00 01 00 74 09 41 0f b6 4c 24 51 48 d3 e7 e8 42 2a c1 ff <8b> 40 08 83 f8 09 0f 84 b0 fc ff ff 83 f8 12 0f 84 a7 fc ff ff 83 kernel: RSP: 0018:ffffc900008e3d20 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffffea0000000000 RCX: 0000000000000009 kernel: RDX: ffffffff825c64f0 RSI: 0000000000001000 RDI: 0000000000001000 kernel: RBP: ffffc900008e3d68 R08: 0000000000200000 R09: 00000000000001e4 kernel: R10: 0000000000000058 R11: ffffffff8254a854 R12: ffffea0004200000 kernel: R13: 0000000000108000 R14: 0000000000110000 R15: 0000000000000000 kernel: FS: 00007ff172339b80(0000) GS:ffff88803eb00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000008 CR3: 0000000038d78006 CR4: 00000000003606a0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [akpm(a)linux-foundation.org: fix brace layout, per David. Reduce indentation] Link: http://lkml.kernel.org/r/20190122154407.18417-1-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador(a)suse.de> Reviewed-by: Anthony Yznaga <anthony.yznaga(a)oracle.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Reviewed-by: David Hildenbrand <david(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory_hotplug.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-) --- a/mm/memory_hotplug.c~mmmemory_hotplug-fix-scan_movable_pages-for-gigantic-hugepages +++ a/mm/memory_hotplug.c @@ -1305,23 +1305,27 @@ int test_pages_in_a_zone(unsigned long s static unsigned long scan_movable_pages(unsigned long start, unsigned long end) { unsigned long pfn; - struct page *page; + for (pfn = start; pfn < end; pfn++) { - if (pfn_valid(pfn)) { - page = pfn_to_page(pfn); - if (PageLRU(page)) - return pfn; - if (__PageMovable(page)) - return pfn; - if (PageHuge(page)) { - if (hugepage_migration_supported(page_hstate(page)) && - page_huge_active(page)) - return pfn; - else - pfn = round_up(pfn + 1, - 1 << compound_order(page)) - 1; - } - } + struct page *page, *head; + unsigned long skip; + + if (!pfn_valid(pfn)) + continue; + page = pfn_to_page(pfn); + if (PageLRU(page)) + return pfn; + if (__PageMovable(page)) + return pfn; + + if (!PageHuge(page)) + continue; + head = compound_head(page); + if (hugepage_migration_supported(page_hstate(head)) && + page_huge_active(head)) + return pfn; + skip = (1 << compound_order(head)) - (page - head); + pfn += skip - 1; } return 0; } _

6 years, 9 months

1
0
0 0

[patch 08/24] oom, oom_reaper: do not enqueue same task twice

by akpm＠linux-foundation.org

From: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp> Subject: oom, oom_reaper: do not enqueue same task twice Arkadiusz reported that enabling memcg's group oom killing causes strange memcg statistics where there is no task in a memcg despite the number of tasks in that memcg is not 0. It turned out that there is a bug in wake_oom_reaper() which allows enqueuing same task twice which makes impossible to decrease the number of tasks in that memcg due to a refcount leak. This bug existed since the OOM reaper became invokable from task_will_free_mem(current) path in out_of_memory() in Linux 4.7, T1@P1 |T2@P1 |T3@P1 |OOM reaper ----------+----------+----------+------------ # Processing an OOM victim in a different memcg domain. try_charge() mem_cgroup_out_of_memory() mutex_lock(&oom_lock) try_charge() mem_cgroup_out_of_memory() mutex_lock(&oom_lock) try_charge() mem_cgroup_out_of_memory() mutex_lock(&oom_lock) out_of_memory() oom_kill_process(P1) do_send_sig_info(SIGKILL, @P1) mark_oom_victim(T1@P1) wake_oom_reaper(T1@P1) # T1@P1 is enqueued. mutex_unlock(&oom_lock) out_of_memory() mark_oom_victim(T2@P1) wake_oom_reaper(T2@P1) # T2@P1 is enqueued. mutex_unlock(&oom_lock) out_of_memory() mark_oom_victim(T1@P1) wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL. mutex_unlock(&oom_lock) # Completed processing an OOM victim in a different memcg domain. spin_lock(&oom_reaper_lock) # T1P1 is dequeued. spin_unlock(&oom_reaper_lock) but memcg's group oom killing made it easier to trigger this bug by calling wake_oom_reaper() on the same task from one out_of_memory() request. Fix this bug using an approach used by commit 855b018325737f76 ("oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task"). As a side effect of this patch, this patch also avoids enqueuing multiple threads sharing memory via task_will_free_mem(current) path. Link: http://lkml.kernel.org/r/e865a044-2c10-9858-f4ef-254bc71d6cc2@i-love.sakura… Link: http://lkml.kernel.org/r/5ee34fc6-1485-34f8-8790-903ddabaa809@i-love.sakura… Fixes: af8e15cc85a25315 ("oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head") Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp> Reported-by: Arkadiusz Miskiewicz <arekm(a)maven.pl> Tested-by: Arkadiusz Miskiewicz <arekm(a)maven.pl> Acked-by: Michal Hocko <mhocko(a)suse.com> Acked-by: Roman Gushchin <guro(a)fb.com> Cc: Tejun Heo <tj(a)kernel.org> Cc: Aleksa Sarai <asarai(a)suse.de> Cc: Jay Kamat <jgkamat(a)fb.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/sched/coredump.h | 1 + mm/oom_kill.c | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) --- a/include/linux/sched/coredump.h~oom-oom_reaper-do-not-enqueue-same-task-twice +++ a/include/linux/sched/coredump.h @@ -71,6 +71,7 @@ static inline int get_dumpable(struct mm #define MMF_HUGE_ZERO_PAGE 23 /* mm has ever used the global huge zero page */ #define MMF_DISABLE_THP 24 /* disable THP for all VMAs */ #define MMF_OOM_VICTIM 25 /* mm is the oom victim */ +#define MMF_OOM_REAP_QUEUED 26 /* mm was queued for oom_reaper */ #define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ --- a/mm/oom_kill.c~oom-oom_reaper-do-not-enqueue-same-task-twice +++ a/mm/oom_kill.c @@ -647,8 +647,8 @@ static int oom_reaper(void *unused) static void wake_oom_reaper(struct task_struct *tsk) { - /* tsk is already queued? */ - if (tsk == oom_reaper_list || tsk->oom_reaper_list) + /* mm is already queued? */ + if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) return; get_task_struct(tsk); _

6 years, 9 months

1
0
0 0

[patch 07/24] mm: migrate: make buffer_migrate_page_norefs() actually succeed

by akpm＠linux-foundation.org

From: Jan Kara <jack(a)suse.cz> Subject: mm: migrate: make buffer_migrate_page_norefs() actually succeed Currently, buffer_migrate_page_norefs() was constantly failing because buffer_migrate_lock_buffers() grabbed reference on each buffer. In fact, there's no reason for buffer_migrate_lock_buffers() to grab any buffer references as the page is locked during all our operation and thus nobody can reclaim buffers from the page. So remove grabbing of buffer references which also makes buffer_migrate_page_norefs() succeed. Link: http://lkml.kernel.org/r/20190116131217.7226-1-jack@suse.cz Fixes: 89cb0888ca14 "mm: migrate: provide buffer_migrate_page_norefs()" Signed-off-by: Jan Kara <jack(a)suse.cz> Cc: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com> Cc: Pavel Machek <pavel(a)ucw.cz> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Zi Yan <zi.yan(a)cs.rutgers.edu> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/migrate.c | 5 ----- 1 file changed, 5 deletions(-) --- a/mm/migrate.c~mm-migrate-make-buffer_migrate_page_norefs-actually-succeed +++ a/mm/migrate.c @@ -709,7 +709,6 @@ static bool buffer_migrate_lock_buffers( /* Simple case, sync compaction */ if (mode != MIGRATE_ASYNC) { do { - get_bh(bh); lock_buffer(bh); bh = bh->b_this_page; @@ -720,18 +719,15 @@ static bool buffer_migrate_lock_buffers( /* async case, we cannot block on lock_buffer so use trylock_buffer */ do { - get_bh(bh); if (!trylock_buffer(bh)) { /* * We failed to lock the buffer and cannot stall in * async migration. Release the taken locks */ struct buffer_head *failed_bh = bh; - put_bh(failed_bh); bh = head; while (bh != failed_bh) { unlock_buffer(bh); - put_bh(bh); bh = bh->b_this_page; } return false; @@ -818,7 +814,6 @@ unlock_buffers: bh = head; do { unlock_buffer(bh); - put_bh(bh); bh = bh->b_this_page; } while (bh != head); _

6 years, 9 months

1
0
0 0

[patch 06/24] kernel/exit.c: release ptraced tasks before zap_pid_ns_processes

by akpm＠linux-foundation.org

From: Andrei Vagin <avagin(a)gmail.com> Subject: kernel/exit.c: release ptraced tasks before zap_pid_ns_processes Currently, exit_ptrace() adds all ptraced tasks in a dead list, then zap_pid_ns_processes() waits on all tasks in a current pidns, and only then are tasks from the dead list released. zap_pid_ns_processes() can get stuck on waiting tasks from the dead list. In this case, we will have one unkillable process with one or more dead children. Thanks to Oleg for the advice to release tasks in find_child_reaper(). Link: http://lkml.kernel.org/r/20190110175200.12442-1-avagin@gmail.com Fixes: 7c8bd2322c7f ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()") Signed-off-by: Andrei Vagin <avagin(a)gmail.com> Signed-off-by: Oleg Nesterov <oleg(a)redhat.com> Cc: "Eric W. Biederman" <ebiederm(a)xmission.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/exit.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) --- a/kernel/exit.c~kernel-release-ptraced-tasks-before-zap_pid_ns_processes +++ a/kernel/exit.c @@ -558,12 +558,14 @@ static struct task_struct *find_alive_th return NULL; } -static struct task_struct *find_child_reaper(struct task_struct *father) +static struct task_struct *find_child_reaper(struct task_struct *father, + struct list_head *dead) __releases(&tasklist_lock) __acquires(&tasklist_lock) { struct pid_namespace *pid_ns = task_active_pid_ns(father); struct task_struct *reaper = pid_ns->child_reaper; + struct task_struct *p, *n; if (likely(reaper != father)) return reaper; @@ -579,6 +581,12 @@ static struct task_struct *find_child_re panic("Attempted to kill init! exitcode=0x%08x\n", father->signal->group_exit_code ?: father->exit_code); } + + list_for_each_entry_safe(p, n, dead, ptrace_entry) { + list_del_init(&p->ptrace_entry); + release_task(p); + } + zap_pid_ns_processes(pid_ns); write_lock_irq(&tasklist_lock); @@ -668,7 +676,7 @@ static void forget_original_parent(struc exit_ptrace(father, dead); /* Can drop and reacquire tasklist_lock */ - reaper = find_child_reaper(father); + reaper = find_child_reaper(father, dead); if (list_empty(&father->children)) return; _

6 years, 9 months

1
0
0 0

[patch 04/24] mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT

by akpm＠linux-foundation.org

From: Andrea Arcangeli <aarcange(a)redhat.com> Subject: mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT hugetlb needs the same fix as faultin_nopage (which was applied in 96312e61282ae ("mm/gup.c: teach get_user_pages_unlocked to handle FOLL_NOWAIT")) or KVM hangs because it thinks the mmap_sem was already released by hugetlb_fault() if it returned VM_FAULT_RETRY, but it wasn't in the FOLL_NOWAIT case. Link: http://lkml.kernel.org/r/20190109020203.26669-2-aarcange@redhat.com Fixes: ce53053ce378 ("kvm: switch get_user_page_nowait() to get_user_pages_unlocked()") Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com> Tested-by: "Dr. David Alan Gilbert" <dgilbert(a)redhat.com> Reported-by: "Dr. David Alan Gilbert" <dgilbert(a)redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com> Reviewed-by: Peter Xu <peterx(a)redhat.com> Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/hugetlb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/mm/hugetlb.c~mm-hugetlbc-teach-follow_hugetlb_page-to-handle-foll_nowait +++ a/mm/hugetlb.c @@ -4268,7 +4268,8 @@ long follow_hugetlb_page(struct mm_struc break; } if (ret & VM_FAULT_RETRY) { - if (nonblocking) + if (nonblocking && + !(fault_flags & FAULT_FLAG_RETRY_NOWAIT)) *nonblocking = 0; *nr_pages = 0; /* _

6 years, 9 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror February 2019