November 2018 - Linux-stable-mirror

[PATCH 3/6] drm/nouveau: Stop reading port->mgr in nv50_mstc_get_modes()

by Lyude Paul

mstc->port isn't validated here so it could be null or worse when we access it. And drivers aren't ever supposed to be looking at it's contents anyway. Plus, we can already get the MST manager from &mstc->mstm->mgr. Signed-off-by: Lyude Paul <lyude(a)redhat.com> Cc: stable(a)vger.kernel.org --- drivers/gpu/drm/nouveau/dispnv50/disp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c index e6f72ca0b1fa..66c40b56a0cb 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -893,7 +893,8 @@ nv50_mstc_get_modes(struct drm_connector *connector) struct nv50_mstc *mstc = nv50_mstc(connector); int ret = 0; - mstc->edid = drm_dp_mst_get_edid(&mstc->connector, mstc->port->mgr, mstc->port); + mstc->edid = drm_dp_mst_get_edid(&mstc->connector, + &mstc->mstm->mgr, mstc->port); drm_connector_update_edid_property(&mstc->connector, mstc->edid); if (mstc->edid) ret = drm_add_edid_modes(&mstc->connector, mstc->edid); -- 2.19.1

6 years, 7 months

1
0
0 0

[patch 07/18] mm: don't reclaim inodes with many attached pages

by akpm＠linux-foundation.org

From: Roman Gushchin <guro(a)fb.com> Subject: mm: don't reclaim inodes with many attached pages Spock reported that the commit 172b06c32b94 ("mm: slowly shrink slabs with a relatively small number of objects") leads to a regression on his setup: periodically the majority of the pagecache is evicted without an obvious reason, while before the change the amount of free memory was balancing around the watermark. The reason behind is that the mentioned above change created some minimal background pressure on the inode cache. The problem is that if an inode is considered to be reclaimed, all belonging pagecache page are stripped, no matter how many of them are there. So, if a huge multi-gigabyte file is cached in the memory, and the goal is to reclaim only few slab objects (unused inodes), we still can eventually evict all gigabytes of the pagecache at once. The workload described by Spock has few large non-mapped files in the pagecache, so it's especially noticeable. To solve the problem let's postpone the reclaim of inodes, which have more than 1 attached page. Let's wait until the pagecache pages will be evicted naturally by scanning the corresponding LRU lists, and only then reclaim the inode structure. Link: http://lkml.kernel.org/r/20181023164302.20436-1-guro@fb.com Signed-off-by: Roman Gushchin <guro(a)fb.com> Reported-by: Spock <dairinin(a)gmail.com> Tested-by: Spock <dairinin(a)gmail.com> Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Rik van Riel <riel(a)surriel.com> Cc: Randy Dunlap <rdunlap(a)infradead.org> Cc: <stable(a)vger.kernel.org> [4.19.x] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- --- a/fs/inode.c~mm-dont-reclaim-inodes-with-many-attached-pages +++ a/fs/inode.c @@ -730,8 +730,11 @@ static enum lru_status inode_lru_isolate return LRU_REMOVED; } - /* recently referenced inodes get one more pass */ - if (inode->i_state & I_REFERENCED) { + /* + * Recently referenced inodes and inodes with many attached pages + * get one more pass. + */ + if (inode->i_state & I_REFERENCED || inode->i_data.nrpages > 1) { inode->i_state &= ~I_REFERENCED; spin_unlock(&inode->i_lock); return LRU_ROTATE; _

6 years, 7 months

2
1
0 0

[patch 12/18] lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn

by akpm＠linux-foundation.org

From: Arnd Bergmann <arnd(a)arndb.de> Subject: lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn gcc-8 complains about the prototype for this function: lib/ubsan.c:432:1: error: ignoring attribute 'noreturn' in declaration of a built-in function '__ubsan_handle_builtin_unreachable' because it conflicts with attribute 'const' [-Werror=attributes] This is actually a GCC's bug. In GCC internals __ubsan_handle_builtin_unreachable() declared with both 'noreturn' and 'const' attributes instead of only 'noreturn': https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84210 Workaround this by removing the noreturn attribute. [aryabinin: add information about GCC bug in changelog] Link: http://lkml.kernel.org/r/20181107144516.4587-1-aryabinin@virtuozzo.com Signed-off-by: Arnd Bergmann <arnd(a)arndb.de> Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com> Acked-by: Olof Johansson <olof(a)lixom.net> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- --- a/lib/ubsan.c~ubsan-dont-mark-__ubsan_handle_builtin_unreachable-as-noreturn +++ a/lib/ubsan.c @@ -427,8 +427,7 @@ void __ubsan_handle_shift_out_of_bounds( EXPORT_SYMBOL(__ubsan_handle_shift_out_of_bounds); -void __noreturn -__ubsan_handle_builtin_unreachable(struct unreachable_data *data) +void __ubsan_handle_builtin_unreachable(struct unreachable_data *data) { unsigned long flags; _

6 years, 7 months

1
0
0 0

[patch 09/18] ocfs2: free up write context when direct IO failed

by akpm＠linux-foundation.org

From: Wengang Wang <wen.gang.wang(a)oracle.com> Subject: ocfs2: free up write context when direct IO failed The write context should also be freed even when direct IO failed. Otherwise a memory leak is introduced and entries remain in oi->ip_unwritten_list causing the following BUG later in unlink path: ERROR: bug expression: !list_empty(&oi->ip_unwritten_list) ERROR: Clear inode of 215043, inode has unwritten extents ... Call Trace: ? __set_current_blocked+0x42/0x68 ocfs2_evict_inode+0x91/0x6a0 [ocfs2] ? bit_waitqueue+0x40/0x33 evict+0xdb/0x1af iput+0x1a2/0x1f7 do_unlinkat+0x194/0x28f SyS_unlinkat+0x1b/0x2f do_syscall_64+0x79/0x1ae entry_SYSCALL_64_after_hwframe+0x151/0x0 This patch also logs, with frequency limit, direct IO failures. Link: http://lkml.kernel.org/r/20181102170632.25921-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang(a)oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi(a)oracle.com> Reviewed-by: Changwei Ge <ge.changwei(a)h3c.com> Reviewed-by: Joseph Qi <jiangqi903(a)gmail.com> Cc: Mark Fasheh <mark(a)fasheh.com> Cc: Joel Becker <jlbec(a)evilplan.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- --- a/fs/ocfs2/aops.c~ocfs2-free-up-write-context-when-direct-io-failed +++ a/fs/ocfs2/aops.c @@ -2411,8 +2411,16 @@ static int ocfs2_dio_end_io(struct kiocb /* this io's submitter should not have unlocked this before we could */ BUG_ON(!ocfs2_iocb_is_rw_locked(iocb)); - if (bytes > 0 && private) - ret = ocfs2_dio_end_io_write(inode, private, offset, bytes); + if (bytes <= 0) + mlog_ratelimited(ML_ERROR, "Direct IO failed, bytes = %lld", + (long long)bytes); + if (private) { + if (bytes > 0) + ret = ocfs2_dio_end_io_write(inode, private, offset, + bytes); + else + ocfs2_dio_free_write_ctx(inode, private); + } ocfs2_iocb_clear_rw_locked(iocb); --- a/fs/ocfs2/cluster/masklog.h~ocfs2-free-up-write-context-when-direct-io-failed +++ a/fs/ocfs2/cluster/masklog.h @@ -178,6 +178,15 @@ do { \ ##__VA_ARGS__); \ } while (0) +#define mlog_ratelimited(mask, fmt, ...) \ +do { \ + static DEFINE_RATELIMIT_STATE(_rs, \ + DEFAULT_RATELIMIT_INTERVAL, \ + DEFAULT_RATELIMIT_BURST); \ + if (__ratelimit(&_rs)) \ + mlog(mask, fmt, ##__VA_ARGS__); \ +} while (0) + #define mlog_errno(st) ({ \ int _st = (st); \ if (_st != -ERESTARTSYS && _st != -EINTR && \ _

6 years, 7 months

1
0
0 0

[patch 05/18] mm/swapfile.c: use kvzalloc for swap_info_struct allocation

by akpm＠linux-foundation.org

From: Vasily Averin <vvs(a)virtuozzo.com> Subject: mm/swapfile.c: use kvzalloc for swap_info_struct allocation a2468cc9bfdf ("swap: choose swap device according to numa node") changed 'avail_lists' field of 'struct swap_info_struct' to an array. In popular linux distros it increased size of swap_info_struct up to 40 Kbytes and now swap_info_struct allocation requires order-4 page. Switch to kvzmalloc allows to avoid unexpected allocation failures. Link: http://lkml.kernel.org/r/fc23172d-3c75-21e2-d551-8b1808cbe593@virtuozzo.com Fixes: a2468cc9bfdf ("swap: choose swap device according to numa node") Signed-off-by: Vasily Averin <vvs(a)virtuozzo.com> Acked-by: Aaron Lu <aaron.lu(a)intel.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org> Cc: Huang Ying <ying.huang(a)intel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- --- a/mm/swapfile.c~mm-use-kvzalloc-for-swap_info_struct-allocation +++ a/mm/swapfile.c @@ -2813,7 +2813,7 @@ static struct swap_info_struct *alloc_sw unsigned int type; int i; - p = kzalloc(sizeof(*p), GFP_KERNEL); + p = kvzalloc(sizeof(*p), GFP_KERNEL); if (!p) return ERR_PTR(-ENOMEM); @@ -2824,7 +2824,7 @@ static struct swap_info_struct *alloc_sw } if (type >= MAX_SWAPFILES) { spin_unlock(&swap_lock); - kfree(p); + kvfree(p); return ERR_PTR(-EPERM); } if (type >= nr_swapfiles) { @@ -2838,7 +2838,7 @@ static struct swap_info_struct *alloc_sw smp_wmb(); nr_swapfiles++; } else { - kfree(p); + kvfree(p); p = swap_info[type]; /* * Do not memset this entry: a racing procfs swap_next() _

6 years, 7 months

1
0
0 0

[patch 03/18] hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444!

by akpm＠linux-foundation.org

From: Mike Kravetz <mike.kravetz(a)oracle.com> Subject: hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444! This bug has been experienced several times by Oracle DB team. The BUG is in remove_inode_hugepages() as follows: /* * If page is mapped, it was faulted in after being * unmapped in caller. Unmap (again) now after taking * the fault mutex. The mutex will prevent faults * until we finish removing the page. * * This race can only happen in the hole punch case. * Getting here in a truncate operation is a bug. */ if (unlikely(page_mapped(page))) { BUG_ON(truncate_op); In this case, the elevated map count is not the result of a race. Rather it was incorrectly incremented as the result of a bug in the huge pmd sharing code. Consider the following: - Process A maps a hugetlbfs file of sufficient size and alignment (PUD_SIZE) that a pmd page could be shared. - Process B maps the same hugetlbfs file with the same size and alignment such that a pmd page is shared. - Process B then calls mprotect() to change protections for the mapping with the shared pmd. As a result, the pmd is 'unshared'. - Process B then calls mprotect() again to chage protections for the mapping back to their original value. pmd remains unshared. - Process B then forks and process C is created. During the fork process, we do dup_mm -> dup_mmap -> copy_page_range to copy page tables. Copying page tables for hugetlb mappings is done in the routine copy_hugetlb_page_range. In copy_hugetlb_page_range(), the destination pte is obtained by: dst_pte = huge_pte_alloc(dst, addr, sz); If pmd sharing is possible, the returned pointer will be to a pte in an existing page table. In the situation above, process C could share with either process A or process B. Since process A is first in the list, the returned pte is a pointer to a pte in process A's page table. However, the following check for pmd sharing is in copy_hugetlb_page_range. /* If the pagetables are shared don't copy or take references */ if (dst_pte == src_pte) continue; Since process C is sharing with process A instead of process B, the above test fails. The code in copy_hugetlb_page_range which follows assumes dst_pte points to a huge_pte_none pte. It copies the pte entry from src_pte to dst_pte and increments this map count of the associated page. This is how we end up with an elevated map count. To solve, check the dst_pte entry for huge_pte_none. If !none, this implies PMD sharing so do not copy. Link: http://lkml.kernel.org/r/20181105212315.14125-1-mike.kravetz@oracle.com Fixes: c5c99429fa57 ("fix hugepages leak due to pagetable page sharing") Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com> Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Prakash Sangappa <prakash.sangappa(a)oracle.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- --- a/mm/hugetlb.c~hugetlbfs-fix-kernel-bug-at-fs-hugetlbfs-inodec-444 +++ a/mm/hugetlb.c @@ -3233,7 +3233,7 @@ static int is_hugetlb_entry_hwpoisoned(p int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *vma) { - pte_t *src_pte, *dst_pte, entry; + pte_t *src_pte, *dst_pte, entry, dst_entry; struct page *ptepage; unsigned long addr; int cow; @@ -3261,15 +3261,30 @@ int copy_hugetlb_page_range(struct mm_st break; } - /* If the pagetables are shared don't copy or take references */ - if (dst_pte == src_pte) + /* + * If the pagetables are shared don't copy or take references. + * dst_pte == src_pte is the common case of src/dest sharing. + * + * However, src could have 'unshared' and dst shares with + * another vma. If dst_pte !none, this implies sharing. + * Check here before taking page table lock, and once again + * after taking the lock below. + */ + dst_entry = huge_ptep_get(dst_pte); + if ((dst_pte == src_pte) || !huge_pte_none(dst_entry)) continue; dst_ptl = huge_pte_lock(h, dst, dst_pte); src_ptl = huge_pte_lockptr(h, src, src_pte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); - if (huge_pte_none(entry)) { /* skip none entry */ + dst_entry = huge_ptep_get(dst_pte); + if (huge_pte_none(entry) || !huge_pte_none(dst_entry)) { + /* + * Skip if src entry none. Also, skip in the + * unlikely case dst entry !none as this implies + * sharing with another vma. + */ ; } else if (unlikely(is_hugetlb_entry_migration(entry) || is_hugetlb_entry_hwpoisoned(entry))) { _

6 years, 7 months

1
0
0 0

[PATCH] perf/x86/intel/uncore: Fix client IMC events return huge result

by kan.liang＠linux.intel.com

From: Kan Liang <kan.liang(a)linux.intel.com> The client IMC bandwidth events return very huge result. perf stat -e uncore_imc/data_reads/ -e uncore_imc/data_writes/ -I 10000 -a 10.000117222 34,788.76 MiB uncore_imc/data_reads/ 10.000117222 8.26 MiB uncore_imc/data_writes/ 20.000374584 34,842.89 MiB uncore_imc/data_reads/ 20.000374584 10.45 MiB uncore_imc/data_writes/ 30.000633299 37,965.29 MiB uncore_imc/data_reads/ 30.000633299 323.62 MiB uncore_imc/data_writes/ 40.000891548 41,012.88 MiB uncore_imc/data_reads/ 40.000891548 6.98 MiB uncore_imc/data_writes/ 50.001142480 1,125,899,906,621,494.75 MiB uncore_imc/data_reads/ 50.001142480 6.97 MiB uncore_imc/data_writes/ The client IMC events are freerunning counters. They still use the old event encoding format (0x1 for data_read and 0x2 for data write). The counter bit width is calculated by common code, which assume that the standard encoding format is used for the freerunning counters. Error bit width information is calculated. The event->attr.config, which directly from user space, should not be used by the functions of freerunning counters. For client IMC events, the attr.config needs to be converted to the standard encoding format. The modified event config will be stored in event->hw.config. For other freerunning counters, the attr.config has the correct format. Just save it in event->hw.config. Using event->hw.config to replace event->attr.config for the functions of freerunning counters. Fix: commit 9aae1780e7e8 ("perf/x86/intel/uncore: Clean up client IMC uncore") Reported-by: Jin Yao <yao.jin(a)linux.intel.com> Tested-by: Jin Yao <yao.jin(a)linux.intel.com> Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com> --- arch/x86/events/intel/uncore.c | 1 + arch/x86/events/intel/uncore.h | 12 ++++++------ arch/x86/events/intel/uncore_snb.c | 4 +++- 3 files changed, 10 insertions(+), 7 deletions(-) diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c index 27a461414b30..2690135bf83f 100644 --- a/arch/x86/events/intel/uncore.c +++ b/arch/x86/events/intel/uncore.c @@ -740,6 +740,7 @@ static int uncore_pmu_event_init(struct perf_event *event) /* fixed counters have event field hardcoded to zero */ hwc->config = 0ULL; } else if (is_freerunning_event(event)) { + hwc->config = event->attr.config; if (!check_valid_freerunning_event(box, event)) return -EINVAL; event->hw.idx = UNCORE_PMC_IDX_FREERUNNING; diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h index e17ab885b1e9..cc6dd4f78158 100644 --- a/arch/x86/events/intel/uncore.h +++ b/arch/x86/events/intel/uncore.h @@ -285,8 +285,8 @@ static inline unsigned int uncore_freerunning_counter(struct intel_uncore_box *box, struct perf_event *event) { - unsigned int type = uncore_freerunning_type(event->attr.config); - unsigned int idx = uncore_freerunning_idx(event->attr.config); + unsigned int type = uncore_freerunning_type(event->hw.config); + unsigned int idx = uncore_freerunning_idx(event->hw.config); struct intel_uncore_pmu *pmu = box->pmu; return pmu->type->freerunning[type].counter_base + @@ -360,7 +360,7 @@ static inline unsigned int uncore_freerunning_bits(struct intel_uncore_box *box, struct perf_event *event) { - unsigned int type = uncore_freerunning_type(event->attr.config); + unsigned int type = uncore_freerunning_type(event->hw.config); return box->pmu->type->freerunning[type].bits; } @@ -368,7 +368,7 @@ unsigned int uncore_freerunning_bits(struct intel_uncore_box *box, static inline int uncore_num_freerunning(struct intel_uncore_box *box, struct perf_event *event) { - unsigned int type = uncore_freerunning_type(event->attr.config); + unsigned int type = uncore_freerunning_type(event->hw.config); return box->pmu->type->freerunning[type].num_counters; } @@ -382,8 +382,8 @@ static inline int uncore_num_freerunning_types(struct intel_uncore_box *box, static inline bool check_valid_freerunning_event(struct intel_uncore_box *box, struct perf_event *event) { - unsigned int type = uncore_freerunning_type(event->attr.config); - unsigned int idx = uncore_freerunning_idx(event->attr.config); + unsigned int type = uncore_freerunning_type(event->hw.config); + unsigned int idx = uncore_freerunning_idx(event->hw.config); return (type < uncore_num_freerunning_types(box, event)) && (idx < uncore_num_freerunning(box, event)); diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c index 8527c3e1038b..48d7121f71c7 100644 --- a/arch/x86/events/intel/uncore_snb.c +++ b/arch/x86/events/intel/uncore_snb.c @@ -425,9 +425,11 @@ static int snb_uncore_imc_event_init(struct perf_event *event) /* must be done before validate_group */ event->hw.event_base = base; - event->hw.config = cfg; event->hw.idx = idx; + /* Convert to standard encoding format for free running counters */ + event->hw.config = ((cfg - 1) << 8) | 0x10ff; + /* no group validation needed, we have free running counters */ return 0; -- 2.14.3

6 years, 7 months

3
2
0 0

[PATCH 1/2] blk-mq: not embed .mq_kobj and ctx->kobj into queue instance

by Ming Lei

Even though .mq_kobj, ctx->kobj and q->kobj share same lifetime from block layer's view, actually they don't because userspace may grab one kobject anytime via sysfs, so each kobject's lifetime has to be independent, then the objects(mq_kobj, ctx) which hosts its own kobject have to be allocated dynamically. This patch fixes kernel panic issue during booting when DEBUG_KOBJECT_RELEASE is enabled. Reported-by: Guenter Roeck <linux(a)roeck-us.net> Cc: Guenter Roeck <linux(a)roeck-us.net> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: stable(a)vger.kernel.org Signed-off-by: Ming Lei <ming.lei(a)redhat.com> --- block/blk-mq-sysfs.c | 59 +++++++++++++++++++++++++++++++++++++++----------- block/blk-mq.c | 13 ++++++----- block/blk-mq.h | 4 ++-- include/linux/blkdev.h | 4 ++-- 4 files changed, 58 insertions(+), 22 deletions(-) diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c index 3d25b9c419e9..bab236955f56 100644 --- a/block/blk-mq-sysfs.c +++ b/block/blk-mq-sysfs.c @@ -13,8 +13,22 @@ #include "blk-mq.h" #include "blk-mq-tag.h" +struct blk_mq_kobj { + struct kobject kobj; +}; + static void blk_mq_sysfs_release(struct kobject *kobj) { + struct blk_mq_kobj *mq_kobj = container_of(kobj, struct blk_mq_kobj, + kobj); + kfree(mq_kobj); +} + +static void blk_mq_ctx_sysfs_release(struct kobject *kobj) +{ + struct blk_mq_ctx *ctx = container_of(kobj, struct blk_mq_ctx, kobj); + + kfree(ctx); } static void blk_mq_hw_sysfs_release(struct kobject *kobj) @@ -213,7 +227,7 @@ static struct kobj_type blk_mq_ktype = { static struct kobj_type blk_mq_ctx_ktype = { .sysfs_ops = &blk_mq_sysfs_ops, .default_attrs = default_ctx_attrs, - .release = blk_mq_sysfs_release, + .release = blk_mq_ctx_sysfs_release, }; static struct kobj_type blk_mq_hw_ktype = { @@ -245,7 +259,7 @@ static int blk_mq_register_hctx(struct blk_mq_hw_ctx *hctx) if (!hctx->nr_ctx) return 0; - ret = kobject_add(&hctx->kobj, &q->mq_kobj, "%u", hctx->queue_num); + ret = kobject_add(&hctx->kobj, q->mq_kobj, "%u", hctx->queue_num); if (ret) return ret; @@ -268,8 +282,8 @@ void blk_mq_unregister_dev(struct device *dev, struct request_queue *q) queue_for_each_hw_ctx(q, hctx, i) blk_mq_unregister_hctx(hctx); - kobject_uevent(&q->mq_kobj, KOBJ_REMOVE); - kobject_del(&q->mq_kobj); + kobject_uevent(q->mq_kobj, KOBJ_REMOVE); + kobject_del(q->mq_kobj); kobject_put(&dev->kobj); q->mq_sysfs_init_done = false; @@ -286,23 +300,42 @@ void blk_mq_sysfs_deinit(struct request_queue *q) int cpu; for_each_possible_cpu(cpu) { - ctx = per_cpu_ptr(q->queue_ctx, cpu); + ctx = *per_cpu_ptr(q->queue_ctx, cpu); kobject_put(&ctx->kobj); } - kobject_put(&q->mq_kobj); + kobject_put(q->mq_kobj); } -void blk_mq_sysfs_init(struct request_queue *q) +int blk_mq_sysfs_init(struct request_queue *q) { struct blk_mq_ctx *ctx; int cpu; + struct blk_mq_kobj *mq_kobj; + + mq_kobj = kzalloc(sizeof(struct blk_mq_kobj), GFP_KERNEL); + if (!mq_kobj) + return -ENOMEM; - kobject_init(&q->mq_kobj, &blk_mq_ktype); + kobject_init(&mq_kobj->kobj, &blk_mq_ktype); for_each_possible_cpu(cpu) { - ctx = per_cpu_ptr(q->queue_ctx, cpu); + ctx = kzalloc_node(sizeof(*ctx), GFP_KERNEL, cpu_to_node(cpu)); + if (!ctx) + goto fail; + *per_cpu_ptr(q->queue_ctx, cpu) = ctx; kobject_init(&ctx->kobj, &blk_mq_ctx_ktype); } + q->mq_kobj = &mq_kobj->kobj; + return 0; + + fail: + for_each_possible_cpu(cpu) { + ctx = *per_cpu_ptr(q->queue_ctx, cpu); + if (ctx) + kobject_put(&ctx->kobj); + } + kobject_put(&mq_kobj->kobj); + return -ENOMEM; } int __blk_mq_register_dev(struct device *dev, struct request_queue *q) @@ -313,11 +346,11 @@ int __blk_mq_register_dev(struct device *dev, struct request_queue *q) WARN_ON_ONCE(!q->kobj.parent); lockdep_assert_held(&q->sysfs_lock); - ret = kobject_add(&q->mq_kobj, kobject_get(&dev->kobj), "%s", "mq"); + ret = kobject_add(q->mq_kobj, kobject_get(&dev->kobj), "%s", "mq"); if (ret < 0) goto out; - kobject_uevent(&q->mq_kobj, KOBJ_ADD); + kobject_uevent(q->mq_kobj, KOBJ_ADD); queue_for_each_hw_ctx(q, hctx, i) { ret = blk_mq_register_hctx(hctx); @@ -334,8 +367,8 @@ int __blk_mq_register_dev(struct device *dev, struct request_queue *q) while (--i >= 0) blk_mq_unregister_hctx(q->queue_hw_ctx[i]); - kobject_uevent(&q->mq_kobj, KOBJ_REMOVE); - kobject_del(&q->mq_kobj); + kobject_uevent(q->mq_kobj, KOBJ_REMOVE); + kobject_del(q->mq_kobj); kobject_put(&dev->kobj); return ret; } diff --git a/block/blk-mq.c b/block/blk-mq.c index 3b823891b3ef..3589ee601f37 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2299,7 +2299,7 @@ static void blk_mq_init_cpu_queues(struct request_queue *q, unsigned int i, j; for_each_possible_cpu(i) { - struct blk_mq_ctx *__ctx = per_cpu_ptr(q->queue_ctx, i); + struct blk_mq_ctx *__ctx = *per_cpu_ptr(q->queue_ctx, i); struct blk_mq_hw_ctx *hctx; __ctx->cpu = i; @@ -2385,7 +2385,7 @@ static void blk_mq_map_swqueue(struct request_queue *q) set->map[0].mq_map[i] = 0; } - ctx = per_cpu_ptr(q->queue_ctx, i); + ctx = *per_cpu_ptr(q->queue_ctx, i); for (j = 0; j < set->nr_maps; j++) { hctx = blk_mq_map_queue_type(q, j, i); @@ -2731,18 +2731,19 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, if (!q->poll_cb) goto err_exit; - q->queue_ctx = alloc_percpu(struct blk_mq_ctx); + q->queue_ctx = alloc_percpu(struct blk_mq_ctx *); if (!q->queue_ctx) goto err_exit; /* init q->mq_kobj and sw queues' kobjects */ - blk_mq_sysfs_init(q); + if (blk_mq_sysfs_init(q)) + goto err_percpu; q->nr_queues = nr_hw_queues(set); q->queue_hw_ctx = kcalloc_node(q->nr_queues, sizeof(*(q->queue_hw_ctx)), GFP_KERNEL, set->numa_node); if (!q->queue_hw_ctx) - goto err_percpu; + goto err_sys_init; blk_mq_realloc_hw_ctxs(set, q); if (!q->nr_hw_queues) @@ -2794,6 +2795,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, err_hctxs: kfree(q->queue_hw_ctx); +err_sys_init: + blk_mq_sysfs_deinit(q); err_percpu: free_percpu(q->queue_ctx); err_exit: diff --git a/block/blk-mq.h b/block/blk-mq.h index facb6e9ddce4..84898793c230 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -108,7 +108,7 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q, /* * sysfs helpers */ -extern void blk_mq_sysfs_init(struct request_queue *q); +extern int blk_mq_sysfs_init(struct request_queue *q); extern void blk_mq_sysfs_deinit(struct request_queue *q); extern int __blk_mq_register_dev(struct device *dev, struct request_queue *q); extern int blk_mq_sysfs_register(struct request_queue *q); @@ -129,7 +129,7 @@ static inline enum mq_rq_state blk_mq_rq_state(struct request *rq) static inline struct blk_mq_ctx *__blk_mq_get_ctx(struct request_queue *q, unsigned int cpu) { - return per_cpu_ptr(q->queue_ctx, cpu); + return *per_cpu_ptr(q->queue_ctx, cpu); } /* diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1d185f1fc333..9e3892bd67fd 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -407,7 +407,7 @@ struct request_queue { const struct blk_mq_ops *mq_ops; /* sw queues */ - struct blk_mq_ctx __percpu *queue_ctx; + struct blk_mq_ctx __percpu **queue_ctx; unsigned int nr_queues; unsigned int queue_depth; @@ -456,7 +456,7 @@ struct request_queue { /* * mq queue kobject */ - struct kobject mq_kobj; + struct kobject *mq_kobj; #ifdef CONFIG_BLK_DEV_INTEGRITY struct blk_integrity integrity; -- 2.9.5

6 years, 7 months

3
5
0 0

[PATCH FIX] brcmfmac: fix reporting support for 160 MHz channels

by Rafał Miłecki

From: Rafał Miłecki <rafal(a)milecki.pl> Driver can report IEEE80211_VHT_CAP_SUPP_CHAN_WIDTH_160MHZ so it's important to provide valid & complete info about supported bands for each channel. By default no support for 160 MHz should be assumed unless firmware reports it for a given channel later. This fixes info passed to the userspace. Without that change userspace could try to use invalid channel and fail to start an interface. Signed-off-by: Rafał Miłecki <rafal(a)milecki.pl> Cc: stable(a)vger.kernel.org --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c index 230a378c26fc..7f0a5bade70a 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c @@ -6005,7 +6005,8 @@ static int brcmf_construct_chaninfo(struct brcmf_cfg80211_info *cfg, * for subsequent chanspecs. */ channel->flags = IEEE80211_CHAN_NO_HT40 | - IEEE80211_CHAN_NO_80MHZ; + IEEE80211_CHAN_NO_80MHZ | + IEEE80211_CHAN_NO_160MHZ; ch.bw = BRCMU_CHAN_BW_20; cfg->d11inf.encchspec(&ch); chaninfo = ch.chspec; -- 2.13.7

6 years, 7 months

3
4
0 0

[PATCH 1/6] zram: fix lockdep warning of free block handling

by Minchan Kim

[ 254.519728] ================================ [ 254.520311] WARNING: inconsistent lock state [ 254.520898] 4.19.0+ #390 Not tainted [ 254.521387] -------------------------------- [ 254.521732] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 254.521732] zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 254.521732] 00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50 [ 254.521732] {SOFTIRQ-ON-W} state was registered at: [ 254.521732] _raw_spin_lock+0x2c/0x40 [ 254.521732] zram_make_request+0x755/0xdc9 [ 254.521732] generic_make_request+0x373/0x6a0 [ 254.521732] submit_bio+0x6c/0x140 [ 254.521732] __swap_writepage+0x3a8/0x480 [ 254.521732] shrink_page_list+0x1102/0x1a60 [ 254.521732] shrink_inactive_list+0x21b/0x3f0 [ 254.521732] shrink_node_memcg.constprop.99+0x4f8/0x7e0 [ 254.521732] shrink_node+0x7d/0x2f0 [ 254.521732] do_try_to_free_pages+0xe0/0x300 [ 254.521732] try_to_free_pages+0x116/0x2b0 [ 254.521732] __alloc_pages_slowpath+0x3f4/0xf80 [ 254.521732] __alloc_pages_nodemask+0x2a2/0x2f0 [ 254.521732] __handle_mm_fault+0x42e/0xb50 [ 254.521732] handle_mm_fault+0x55/0xb0 [ 254.521732] __do_page_fault+0x235/0x4b0 [ 254.521732] page_fault+0x1e/0x30 [ 254.521732] irq event stamp: 228412 [ 254.521732] hardirqs last enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600 [ 254.521732] hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600 [ 254.521732] softirqs last enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427 [ 254.521732] softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0 [ 254.521732] [ 254.521732] other info that might help us debug this: [ 254.521732] Possible unsafe locking scenario: [ 254.521732] [ 254.521732] CPU0 [ 254.521732] ---- [ 254.521732] lock(&(&zram->bitmap_lock)->rlock); [ 254.521732] <Interrupt> [ 254.521732] lock(&(&zram->bitmap_lock)->rlock); [ 254.521732] [ 254.521732] *** DEADLOCK *** [ 254.521732] [ 254.521732] no locks held by zram_verify/2095. [ 254.521732] [ 254.521732] stack backtrace: [ 254.521732] CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390 [ 254.521732] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 254.521732] Call Trace: [ 254.521732] <IRQ> [ 254.521732] dump_stack+0x67/0x9b [ 254.521732] print_usage_bug+0x1bd/0x1d3 [ 254.521732] mark_lock+0x4aa/0x540 [ 254.521732] ? check_usage_backwards+0x160/0x160 [ 254.521732] __lock_acquire+0x51d/0x1300 [ 254.521732] ? free_debug_processing+0x24e/0x400 [ 254.521732] ? bio_endio+0x6d/0x1a0 [ 254.521732] ? lockdep_hardirqs_on+0x9b/0x180 [ 254.521732] ? lock_acquire+0x90/0x180 [ 254.521732] lock_acquire+0x90/0x180 [ 254.521732] ? put_entry_bdev+0x1e/0x50 [ 254.521732] _raw_spin_lock+0x2c/0x40 [ 254.521732] ? put_entry_bdev+0x1e/0x50 [ 254.521732] put_entry_bdev+0x1e/0x50 [ 254.521732] zram_free_page+0xf6/0x110 [ 254.521732] zram_slot_free_notify+0x42/0xa0 [ 254.521732] end_swap_bio_read+0x5b/0x170 [ 254.521732] blk_update_request+0x8f/0x340 [ 254.521732] scsi_end_request+0x2c/0x1e0 [ 254.521732] scsi_io_completion+0x98/0x650 [ 254.521732] blk_done_softirq+0x9e/0xd0 [ 254.521732] __do_softirq+0xcc/0x427 [ 254.521732] irq_exit+0xd1/0xe0 [ 254.521732] do_IRQ+0x93/0x120 [ 254.521732] common_interrupt+0xf/0xf [ 254.521732] </IRQ> With writeback feature, zram_slot_free_notify could be called in softirq context by end_swap_bio_read. However, bitmap_lock is not aware of that so lockdep yell out. Thanks. The problem is not only bitmap_lock but it is also zram_slot_lock so straightforward solution would disable irq on zram_slot_lock which covers every bitmap_lock, too. Although duration disabling the irq is short in many places zram_slot_lock is used, a place(ie, decompress) is not fast enough to hold irqlock on relying on compression algorithm so it's not a option. The approach in this patch is just "best effort", not guarantee "freeing orphan zpage". If the zram_slot_lock contention may happen, kernel couldn't free the zpage until it recycles the block. However, such contention between zram_slot_free_notify and other places to hold zram_slot_lock should be very rare in real practice. To see how often it happens, this patch adds new debug stat "miss_free". It also adds irq lock in get/put_block_bdev to prevent deadlock lockdep reported. The reason I used irq disable rather than bottom half is swap_slot_free_notify could be called with irq disabled so it breaks local_bh_enable's rule. The irqlock works on only writebacked zram slot entry so it should be not frequent lock. Cc: stable(a)vger.kernel.org # 4.14+ Signed-off-by: Minchan Kim <minchan(a)kernel.org> --- drivers/block/zram/zram_drv.c | 56 +++++++++++++++++++++++++---------- drivers/block/zram/zram_drv.h | 1 + 2 files changed, 42 insertions(+), 15 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 4879595200e1..472027eaed60 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -53,6 +53,11 @@ static size_t huge_class_size; static void zram_free_page(struct zram *zram, size_t index); +static int zram_slot_trylock(struct zram *zram, u32 index) +{ + return bit_spin_trylock(ZRAM_LOCK, &zram->table[index].value); +} + static void zram_slot_lock(struct zram *zram, u32 index) { bit_spin_lock(ZRAM_LOCK, &zram->table[index].value); @@ -443,29 +448,45 @@ static ssize_t backing_dev_store(struct device *dev, static unsigned long get_entry_bdev(struct zram *zram) { - unsigned long entry; + unsigned long blk_idx; + unsigned long ret = 0; - spin_lock(&zram->bitmap_lock); /* skip 0 bit to confuse zram.handle = 0 */ - entry = find_next_zero_bit(zram->bitmap, zram->nr_pages, 1); - if (entry == zram->nr_pages) { - spin_unlock(&zram->bitmap_lock); - return 0; + blk_idx = find_next_zero_bit(zram->bitmap, zram->nr_pages, 1); + if (blk_idx == zram->nr_pages) + goto retry; + + spin_lock_irq(&zram->bitmap_lock); + if (test_bit(blk_idx, zram->bitmap)) { + spin_unlock_irq(&zram->bitmap_lock); + goto retry; } - set_bit(entry, zram->bitmap); - spin_unlock(&zram->bitmap_lock); + set_bit(blk_idx, zram->bitmap); + ret = blk_idx; + goto out; +retry: + spin_lock_irq(&zram->bitmap_lock); + blk_idx = find_next_zero_bit(zram->bitmap, zram->nr_pages, 1); + if (blk_idx == zram->nr_pages) + goto out; + + set_bit(blk_idx, zram->bitmap); + ret = blk_idx; +out: + spin_unlock_irq(&zram->bitmap_lock); - return entry; + return ret; } static void put_entry_bdev(struct zram *zram, unsigned long entry) { int was_set; + unsigned long flags; - spin_lock(&zram->bitmap_lock); + spin_lock_irqsave(&zram->bitmap_lock, flags); was_set = test_and_clear_bit(entry, zram->bitmap); - spin_unlock(&zram->bitmap_lock); + spin_unlock_irqrestore(&zram->bitmap_lock, flags); WARN_ON_ONCE(!was_set); } @@ -886,9 +907,10 @@ static ssize_t debug_stat_show(struct device *dev, down_read(&zram->init_lock); ret = scnprintf(buf, PAGE_SIZE, - "version: %d\n%8llu\n", + "version: %d\n%8llu %8llu\n", version, - (u64)atomic64_read(&zram->stats.writestall)); + (u64)atomic64_read(&zram->stats.writestall), + (u64)atomic64_read(&zram->stats.miss_free)); up_read(&zram->init_lock); return ret; @@ -1400,10 +1422,14 @@ static void zram_slot_free_notify(struct block_device *bdev, zram = bdev->bd_disk->private_data; - zram_slot_lock(zram, index); + atomic64_inc(&zram->stats.notify_free); + if (!zram_slot_trylock(zram, index)) { + atomic64_inc(&zram->stats.miss_free); + return; + } + zram_free_page(zram, index); zram_slot_unlock(zram, index); - atomic64_inc(&zram->stats.notify_free); } static int zram_rw_page(struct block_device *bdev, sector_t sector, diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index 72c8584b6dff..89da501292ff 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -79,6 +79,7 @@ struct zram_stats { atomic64_t pages_stored; /* no. of pages currently stored */ atomic_long_t max_used_pages; /* no. of maximum pages stored */ atomic64_t writestall; /* no. of write slow paths */ + atomic64_t miss_free; /* no. of missed free */ }; struct zram { -- 2.19.1.1215.g8438c0b245-goog

6 years, 7 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror November 2018