November 2017 - Linux-stable-mirror

[Linux-stable-mirror] Please add lkft-triage@lists.linaro.org to your announcement e-mails

by Anmar Oueja

Hello Greg, Can you please add lkft-triage(a)lists.linaro.org ML to your CC when sending your announcement e-mails. Thanks, anmar

7 years, 7 months

2
1
0 0

[Linux-stable-mirror] Patch "x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask" has been added to the 4.13-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask to the 4.13-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: x86-cpu-amd-derive-l3-shared_cpu_map-from-cpu_llc_shared_mask.patch and it can be found in the queue-4.13 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 2b83809a5e6d619a780876fcaf68cdc42b50d28c Mon Sep 17 00:00:00 2001 From: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com> Date: Mon, 31 Jul 2017 10:51:59 +0200 Subject: x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask From: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com> commit 2b83809a5e6d619a780876fcaf68cdc42b50d28c upstream. For systems with X86_FEATURE_TOPOEXT, current logic uses the APIC ID to calculate shared_cpu_map. However, APIC IDs are not guaranteed to be contiguous for cores across different L3s (e.g. family17h system w/ downcore configuration). This breaks the logic, and results in an incorrect L3 shared_cpu_map. Instead, always use the previously calculated cpu_llc_shared_mask of each CPU to derive the L3 shared_cpu_map. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com> Signed-off-by: Borislav Petkov <bp(a)suse.de> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Link: http://lkml.kernel.org/r/20170731085159.9455-3-bp@alien8.de Signed-off-by: Ingo Molnar <mingo(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/kernel/cpu/intel_cacheinfo.c | 32 ++++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-) --- a/arch/x86/kernel/cpu/intel_cacheinfo.c +++ b/arch/x86/kernel/cpu/intel_cacheinfo.c @@ -811,7 +811,24 @@ static int __cache_amd_cpumap_setup(unsi struct cacheinfo *this_leaf; int i, sibling; - if (boot_cpu_has(X86_FEATURE_TOPOEXT)) { + /* + * For L3, always use the pre-calculated cpu_llc_shared_mask + * to derive shared_cpu_map. + */ + if (index == 3) { + for_each_cpu(i, cpu_llc_shared_mask(cpu)) { + this_cpu_ci = get_cpu_cacheinfo(i); + if (!this_cpu_ci->info_list) + continue; + this_leaf = this_cpu_ci->info_list + index; + for_each_cpu(sibling, cpu_llc_shared_mask(cpu)) { + if (!cpu_online(sibling)) + continue; + cpumask_set_cpu(sibling, + &this_leaf->shared_cpu_map); + } + } + } else if (boot_cpu_has(X86_FEATURE_TOPOEXT)) { unsigned int apicid, nshared, first, last; this_leaf = this_cpu_ci->info_list + index; @@ -837,19 +854,6 @@ static int __cache_amd_cpumap_setup(unsi continue; cpumask_set_cpu(sibling, &this_leaf->shared_cpu_map); - } - } - } else if (index == 3) { - for_each_cpu(i, cpu_llc_shared_mask(cpu)) { - this_cpu_ci = get_cpu_cacheinfo(i); - if (!this_cpu_ci->info_list) - continue; - this_leaf = this_cpu_ci->info_list + index; - for_each_cpu(sibling, cpu_llc_shared_mask(cpu)) { - if (!cpu_online(sibling)) - continue; - cpumask_set_cpu(sibling, - &this_leaf->shared_cpu_map); } } } else Patches currently in stable-queue which might be from suravee.suthikulpanit(a)amd.com are queue-4.13/x86-cpu-amd-derive-l3-shared_cpu_map-from-cpu_llc_shared_mask.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [PATCH] x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask

by Suravee Suthikulpanit

Upstream commit ID: 2b83809a5e6d619a780876fcaf68cdc42b50d28c Stable kernel version to apply: 4.13.x Reason: This patch fixes the L3 topology for the AMD Zen-based (family17h) processors with down-core configuration. Thanks, Suravee

7 years, 7 months

2
1
0 0

[Linux-stable-mirror] Patch "ocfs2: should wait dio before inode lock in ocfs2_setattr()" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ocfs2: should wait dio before inode lock in ocfs2_setattr() to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: ocfs2-should-wait-dio-before-inode-lock-in-ocfs2_setattr.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 Mon Sep 17 00:00:00 2001 From: alex chen <alex.chen(a)huawei.com> Date: Wed, 15 Nov 2017 17:31:40 -0800 Subject: ocfs2: should wait dio before inode lock in ocfs2_setattr() From: alex chen <alex.chen(a)huawei.com> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream. we should wait dio requests to finish before inode lock in ocfs2_setattr(), otherwise the following deadlock will happen: process 1 process 2 process 3 truncate file 'A' end_io of writing file 'A' receiving the bast messages ocfs2_setattr ocfs2_inode_lock_tracker ocfs2_inode_lock_full inode_dio_wait __inode_dio_wait -->waiting for all dio requests finish dlm_proxy_ast_handler dlm_do_local_bast ocfs2_blocking_ast ocfs2_generic_handle_bast set OCFS2_LOCK_BLOCKED flag dio_end_io dio_bio_end_aio dio_complete ocfs2_dio_end_io ocfs2_dio_end_io_write ocfs2_inode_lock __ocfs2_cluster_lock ocfs2_wait_for_mask -->waiting for OCFS2_LOCK_BLOCKED flag to be cleared, that is waiting for 'process 1' unlocking the inode lock inode_dio_end -->here dec the i_dio_count, but will never be called, so a deadlock happened. Link: http://lkml.kernel.org/r/59F81636.70508@huawei.com Signed-off-by: Alex Chen <alex.chen(a)huawei.com> Reviewed-by: Jun Piao <piaojun(a)huawei.com> Reviewed-by: Joseph Qi <jiangqi903(a)gmail.com> Acked-by: Changwei Ge <ge.changwei(a)h3c.com> Cc: Mark Fasheh <mfasheh(a)versity.com> Cc: Joel Becker <jlbec(a)evilplan.org> Cc: Junxiao Bi <junxiao.bi(a)oracle.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- fs/ocfs2/file.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1166,6 +1166,13 @@ int ocfs2_setattr(struct dentry *dentry, } size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE; if (size_change) { + /* + * Here we should wait dio to finish before inode lock + * to avoid a deadlock between ocfs2_setattr() and + * ocfs2_dio_end_io_write() + */ + inode_dio_wait(inode); + status = ocfs2_rw_lock(inode, 1); if (status < 0) { mlog_errno(status); @@ -1186,8 +1193,6 @@ int ocfs2_setattr(struct dentry *dentry, if (status) goto bail_unlock; - inode_dio_wait(inode); - if (i_size_read(inode) >= attr->ia_size) { if (ocfs2_should_order_data(inode)) { status = ocfs2_begin_ordered_truncate(inode, Patches currently in stable-queue which might be from alex.chen(a)huawei.com are queue-4.9/ocfs2-should-wait-dio-before-inode-lock-in-ocfs2_setattr.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] Patch "ocfs2: fix cluster hang after a node dies" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ocfs2: fix cluster hang after a node dies to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: ocfs2-fix-cluster-hang-after-a-node-dies.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 1c01967116a678fed8e2c68a6ab82abc8effeddc Mon Sep 17 00:00:00 2001 From: Changwei Ge <ge.changwei(a)h3c.com> Date: Wed, 15 Nov 2017 17:31:33 -0800 Subject: ocfs2: fix cluster hang after a node dies From: Changwei Ge <ge.changwei(a)h3c.com> commit 1c01967116a678fed8e2c68a6ab82abc8effeddc upstream. When a node dies, other live nodes have to choose a new master for an existed lock resource mastered by the dead node. As for ocfs2/dlm implementation, this is done by function - dlm_move_lockres_to_recovery_list which marks those lock rsources as DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM changes lock resource's master later. So without invoking dlm_move_lockres_to_recovery_list, no master will be choosed after dlm recovery accomplishment since no lock resource can be found through ::resource list. What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock resources mastered a dead node, it will break up synchronization among nodes. So invoke dlm_move_lockres_to_recovery_list again. Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")' Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED6E0F9@H3CMLB14-… Signed-off-by: Changwei Ge <ge.changwei(a)h3c.com> Reported-by: Vitaly Mayatskih <v.mayatskih(a)gmail.com> Tested-by: Vitaly Mayatskikh <v.mayatskih(a)gmail.com> Cc: Mark Fasheh <mfasheh(a)versity.com> Cc: Joel Becker <jlbec(a)evilplan.org> Cc: Junxiao Bi <junxiao.bi(a)oracle.com> Cc: Joseph Qi <jiangqi903(a)gmail.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- fs/ocfs2/dlm/dlmrecovery.c | 1 + 1 file changed, 1 insertion(+) --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanu dlm_lockres_put(res); continue; } + dlm_move_lockres_to_recovery_list(dlm, res); } else if (res->owner == dlm->node_num) { dlm_free_dead_locks(dlm, res, dead_node); __dlm_lockres_calc_usage(dlm, res); Patches currently in stable-queue which might be from ge.changwei(a)h3c.com are queue-4.9/ocfs2-fix-cluster-hang-after-a-node-dies.patch queue-4.9/ocfs2-should-wait-dio-before-inode-lock-in-ocfs2_setattr.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] Patch "ipmi: fix unsigned long underflow" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ipmi: fix unsigned long underflow to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: ipmi-fix-unsigned-long-underflow.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 392a17b10ec4320d3c0e96e2a23ebaad1123b989 Mon Sep 17 00:00:00 2001 From: Corey Minyard <cminyard(a)mvista.com> Date: Sat, 29 Jul 2017 21:14:55 -0500 Subject: ipmi: fix unsigned long underflow From: Corey Minyard <cminyard(a)mvista.com> commit 392a17b10ec4320d3c0e96e2a23ebaad1123b989 upstream. When I set the timeout to a specific value such as 500ms, the timeout event will not happen in time due to the overflow in function check_msg_timeout: ... ent->timeout -= timeout_period; if (ent->timeout > 0) return; ... The type of timeout_period is long, but ent->timeout is unsigned long. This patch makes the type consistent. Reported-by: Weilong Chen <chenweilong(a)huawei.com> Signed-off-by: Corey Minyard <cminyard(a)mvista.com> Tested-by: Weilong Chen <chenweilong(a)huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/char/ipmi/ipmi_msghandler.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -4029,7 +4029,8 @@ smi_from_recv_msg(ipmi_smi_t intf, struc } static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, - struct list_head *timeouts, long timeout_period, + struct list_head *timeouts, + unsigned long timeout_period, int slot, unsigned long *flags, unsigned int *waiting_msgs) { @@ -4042,8 +4043,8 @@ static void check_msg_timeout(ipmi_smi_t if (!ent->inuse) return; - ent->timeout -= timeout_period; - if (ent->timeout > 0) { + if (timeout_period < ent->timeout) { + ent->timeout -= timeout_period; (*waiting_msgs)++; return; } @@ -4109,7 +4110,8 @@ static void check_msg_timeout(ipmi_smi_t } } -static unsigned int ipmi_timeout_handler(ipmi_smi_t intf, long timeout_period) +static unsigned int ipmi_timeout_handler(ipmi_smi_t intf, + unsigned long timeout_period) { struct list_head timeouts; struct ipmi_recv_msg *msg, *msg2; Patches currently in stable-queue which might be from cminyard(a)mvista.com are queue-4.9/ipmi-fix-unsigned-long-underflow.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] Patch "mm/page_alloc.c: broken deferred calculation" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled mm/page_alloc.c: broken deferred calculation to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mm-page_alloc.c-broken-deferred-calculation.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From d135e5750205a21a212a19dbb05aeb339e2cbea7 Mon Sep 17 00:00:00 2001 From: Pavel Tatashin <pasha.tatashin(a)oracle.com> Date: Wed, 15 Nov 2017 17:38:41 -0800 Subject: mm/page_alloc.c: broken deferred calculation From: Pavel Tatashin <pasha.tatashin(a)oracle.com> commit d135e5750205a21a212a19dbb05aeb339e2cbea7 upstream. In reset_deferred_meminit() we determine number of pages that must not be deferred. We initialize pages for at least 2G of memory, but also pages for reserved memory in this node. The reserved memory is determined in this function: memblock_reserved_memory_within(), which operates over physical addresses, and returns size in bytes. However, reset_deferred_meminit() assumes that that this function operates with pfns, and returns page count. The result is that in the best case machine boots slower than expected due to initializing more pages than needed in single thread, and in the worst case panics because fewer than needed pages are initialized early. Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com Fixes: 864b9a393dcb ("mm: consider memblock reservations for deferred memory initialization sizing") Signed-off-by: Pavel Tatashin <pasha.tatashin(a)oracle.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: Mel Gorman <mgorman(a)techsingularity.net> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- include/linux/mmzone.h | 3 ++- mm/page_alloc.c | 27 ++++++++++++++++++--------- 2 files changed, 20 insertions(+), 10 deletions(-) --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -672,7 +672,8 @@ typedef struct pglist_data { * is the first PFN that needs to be initialised. */ unsigned long first_deferred_pfn; - unsigned long static_init_size; + /* Number of non-deferred pages */ + unsigned long static_init_pgcnt; #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -284,28 +284,37 @@ EXPORT_SYMBOL(nr_online_nodes); int page_group_by_mobility_disabled __read_mostly; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT + +/* + * Determine how many pages need to be initialized durig early boot + * (non-deferred initialization). + * The value of first_deferred_pfn will be set later, once non-deferred pages + * are initialized, but for now set it ULONG_MAX. + */ static inline void reset_deferred_meminit(pg_data_t *pgdat) { - unsigned long max_initialise; - unsigned long reserved_lowmem; + phys_addr_t start_addr, end_addr; + unsigned long max_pgcnt; + unsigned long reserved; /* * Initialise at least 2G of a node but also take into account that * two large system hashes that can take up 1GB for 0.25TB/node. */ - max_initialise = max(2UL << (30 - PAGE_SHIFT), - (pgdat->node_spanned_pages >> 8)); + max_pgcnt = max(2UL << (30 - PAGE_SHIFT), + (pgdat->node_spanned_pages >> 8)); /* * Compensate the all the memblock reservations (e.g. crash kernel) * from the initial estimation to make sure we will initialize enough * memory to boot. */ - reserved_lowmem = memblock_reserved_memory_within(pgdat->node_start_pfn, - pgdat->node_start_pfn + max_initialise); - max_initialise += reserved_lowmem; + start_addr = PFN_PHYS(pgdat->node_start_pfn); + end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt); + reserved = memblock_reserved_memory_within(start_addr, end_addr); + max_pgcnt += PHYS_PFN(reserved); - pgdat->static_init_size = min(max_initialise, pgdat->node_spanned_pages); + pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages); pgdat->first_deferred_pfn = ULONG_MAX; } @@ -332,7 +341,7 @@ static inline bool update_defer_init(pg_ if (zone_end < pgdat_end_pfn(pgdat)) return true; (*nr_initialised)++; - if ((*nr_initialised > pgdat->static_init_size) && + if ((*nr_initialised > pgdat->static_init_pgcnt) && (pfn & (PAGES_PER_SECTION - 1)) == 0) { pgdat->first_deferred_pfn = pfn; return false; Patches currently in stable-queue which might be from pasha.tatashin(a)oracle.com are queue-4.9/mm-page_alloc.c-broken-deferred-calculation.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] Patch "dmaengine: dmatest: warn user when dma test times out" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled dmaengine: dmatest: warn user when dma test times out to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: dmaengine-dmatest-warn-user-when-dma-test-times-out.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From a9df21e34b422f79d9a9fa5c3eff8c2a53491be6 Mon Sep 17 00:00:00 2001 From: Adam Wallis <awallis(a)codeaurora.org> Date: Thu, 2 Nov 2017 08:53:30 -0400 Subject: dmaengine: dmatest: warn user when dma test times out From: Adam Wallis <awallis(a)codeaurora.org> commit a9df21e34b422f79d9a9fa5c3eff8c2a53491be6 upstream. Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") introduced a bug (that is in fact documented by the patch commit text) that leaves behind a dangling pointer. Since the done_wait structure is allocated on the stack, future invocations to the DMATEST can produce undesirable results (e.g., corrupted spinlocks). Ideally, this would be cleaned up in the thread handler, but at the very least, the kernel is left in a very precarious scenario that can lead to some long debug sessions when the crash comes later. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=197605 Signed-off-by: Adam Wallis <awallis(a)codeaurora.org> Signed-off-by: Vinod Koul <vinod.koul(a)intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/dma/dmatest.c | 1 + 1 file changed, 1 insertion(+) --- a/drivers/dma/dmatest.c +++ b/drivers/dma/dmatest.c @@ -666,6 +666,7 @@ static int dmatest_func(void *data) * free it this time?" dancing. For now, just * leave it dangling. */ + WARN(1, "dmatest: Kernel stack may be corrupted!!\n"); dmaengine_unmap_put(um); result("test timed out", total_tests, src_off, dst_off, len, 0); Patches currently in stable-queue which might be from awallis(a)codeaurora.org are queue-4.9/dmaengine-dmatest-warn-user-when-dma-test-times-out.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] Patch "ocfs2: should wait dio before inode lock in ocfs2_setattr()" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ocfs2: should wait dio before inode lock in ocfs2_setattr() to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: ocfs2-should-wait-dio-before-inode-lock-in-ocfs2_setattr.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 Mon Sep 17 00:00:00 2001 From: alex chen <alex.chen(a)huawei.com> Date: Wed, 15 Nov 2017 17:31:40 -0800 Subject: ocfs2: should wait dio before inode lock in ocfs2_setattr() From: alex chen <alex.chen(a)huawei.com> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream. we should wait dio requests to finish before inode lock in ocfs2_setattr(), otherwise the following deadlock will happen: process 1 process 2 process 3 truncate file 'A' end_io of writing file 'A' receiving the bast messages ocfs2_setattr ocfs2_inode_lock_tracker ocfs2_inode_lock_full inode_dio_wait __inode_dio_wait -->waiting for all dio requests finish dlm_proxy_ast_handler dlm_do_local_bast ocfs2_blocking_ast ocfs2_generic_handle_bast set OCFS2_LOCK_BLOCKED flag dio_end_io dio_bio_end_aio dio_complete ocfs2_dio_end_io ocfs2_dio_end_io_write ocfs2_inode_lock __ocfs2_cluster_lock ocfs2_wait_for_mask -->waiting for OCFS2_LOCK_BLOCKED flag to be cleared, that is waiting for 'process 1' unlocking the inode lock inode_dio_end -->here dec the i_dio_count, but will never be called, so a deadlock happened. Link: http://lkml.kernel.org/r/59F81636.70508@huawei.com Signed-off-by: Alex Chen <alex.chen(a)huawei.com> Reviewed-by: Jun Piao <piaojun(a)huawei.com> Reviewed-by: Joseph Qi <jiangqi903(a)gmail.com> Acked-by: Changwei Ge <ge.changwei(a)h3c.com> Cc: Mark Fasheh <mfasheh(a)versity.com> Cc: Joel Becker <jlbec(a)evilplan.org> Cc: Junxiao Bi <junxiao.bi(a)oracle.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- fs/ocfs2/file.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1166,6 +1166,13 @@ int ocfs2_setattr(struct dentry *dentry, } size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE; if (size_change) { + /* + * Here we should wait dio to finish before inode lock + * to avoid a deadlock between ocfs2_setattr() and + * ocfs2_dio_end_io_write() + */ + inode_dio_wait(inode); + status = ocfs2_rw_lock(inode, 1); if (status < 0) { mlog_errno(status); @@ -1186,8 +1193,6 @@ int ocfs2_setattr(struct dentry *dentry, if (status) goto bail_unlock; - inode_dio_wait(inode); - if (i_size_read(inode) >= attr->ia_size) { if (ocfs2_should_order_data(inode)) { status = ocfs2_begin_ordered_truncate(inode, Patches currently in stable-queue which might be from alex.chen(a)huawei.com are queue-4.4/ocfs2-should-wait-dio-before-inode-lock-in-ocfs2_setattr.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [PATCH-stable] nvme: Fix memory order on async queue deletion

by Keith Busch

This patch is a fix specific to the 3.19 - 4.4 kernels. The 4.5 kernel inadvertently fixed this bug differently (db3cbfff5bcc0), but is not a stable candidate due it being a complicated re-write of the entire feature. This patch fixes a potential timing bug with nvme's asynchronous queue deletion, which causes an allocated request to be accidentally released due to the ordering of the shared completion context among the sq/cq pair. The completion context saves the request that issued the queue deletion. If the submission side deletion happens to reset the active request, the completion side will release the wrong request tag back into the pool of available tags. This means the driver will create multiple commands with the same tag, corrupting the queue context. The error is observable in the kernel logs like: "nvme XX:YY:ZZ completed id XX twice on qid:0" In this particular case, this message occurs because the queue is corrupted. The following timing sequence demonstrates the error: CPU A CPU B ----------------------- ----------------------------- nvme_irq nvme_process_cq async_completion queue_kthread_work -----------> nvme_del_sq_work_handler nvme_delete_cq adapter_async_del_queue nvme_submit_admin_async_cmd cmdinfo->req = req; blk_mq_free_request(cmdinfo->req); <-- wrong request!!! This patch fixes the bug by releasing the request in the completion side prior to waking the submission thread, such that that thread can't muck with the shared completion context. Fixes: a4aea5623d4a5 ("NVMe: Convert to blk-mq") Cc: <stable(a)vger.kernel.org> # 4.4.x Cc: <stable(a)vger.kernel.org> # 4.1.x Signed-off-by: Keith Busch <keith.busch(a)intel.com> --- drivers/nvme/host/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 669edbd47602..d6ceb8b91cd6 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -350,8 +350,8 @@ static void async_completion(struct nvme_queue *nvmeq, void *ctx, struct async_cmd_info *cmdinfo = ctx; cmdinfo->result = le32_to_cpup(&cqe->result); cmdinfo->status = le16_to_cpup(&cqe->status) >> 1; - queue_kthread_work(cmdinfo->worker, &cmdinfo->work); blk_mq_free_request(cmdinfo->req); + queue_kthread_work(cmdinfo->worker, &cmdinfo->work); } static inline struct nvme_cmd_info *get_cmd_from_tag(struct nvme_queue *nvmeq, -- 2.13.6

7 years, 7 months

3
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror November 2017