December 2020 - Linux-stable-mirror

FAILED: patch "[PATCH] of: fix linker-section match-table corruption" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 5812b32e01c6d86ba7a84110702b46d8a8531fe9 Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan(a)kernel.org> Date: Mon, 23 Nov 2020 11:23:12 +0100 Subject: [PATCH] of: fix linker-section match-table corruption Specify type alignment when declaring linker-section match-table entries to prevent gcc from increasing alignment and corrupting the various tables with padding (e.g. timers, irqchips, clocks, reserved memory). This is specifically needed on x86 where gcc (typically) aligns larger objects like struct of_device_id with static extent on 32-byte boundaries which at best prevents matching on anything but the first entry. Specifying alignment when declaring variables suppresses this optimisation. Here's a 64-bit example where all entries are corrupt as 16 bytes of padding has been inserted before the first entry: ffffffff8266b4b0 D __clk_of_table ffffffff8266b4c0 d __of_table_fixed_factor_clk ffffffff8266b5a0 d __of_table_fixed_clk ffffffff8266b680 d __clk_of_table_sentinel And here's a 32-bit example where the 8-byte-aligned table happens to be placed on a 32-byte boundary so that all but the first entry are corrupt due to the 28 bytes of padding inserted between entries: 812b3ec0 D __irqchip_of_table 812b3ec0 d __of_table_irqchip1 812b3fa0 d __of_table_irqchip2 812b4080 d __of_table_irqchip3 812b4160 d irqchip_of_match_end Verified on x86 using gcc-9.3 and gcc-4.9 (which uses 64-byte alignment), and on arm using gcc-7.2. Note that there are no in-tree users of these tables on x86 currently (even if they are included in the image). Fixes: 54196ccbe0ba ("of: consolidate linker section OF match table declarations") Fixes: f6e916b82022 ("irqchip: add basic infrastructure") Cc: stable <stable(a)vger.kernel.org> # 3.9 Signed-off-by: Johan Hovold <johan(a)kernel.org> Link: https://lore.kernel.org/r/20201123102319.8090-2-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/include/linux/of.h b/include/linux/of.h index 5d51891cbf1a..af655d264f10 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -1300,6 +1300,7 @@ static inline int of_get_available_child_count(const struct device_node *np) #define _OF_DECLARE(table, name, compat, fn, fn_type) \ static const struct of_device_id __of_table_##name \ __used __section("__" #table "_of_table") \ + __aligned(__alignof__(struct of_device_id)) \ = { .compatible = compat, \ .data = (fn == (fn_type)NULL) ? fn : fn } #else

5 years

1
0
0 0

FAILED: patch "[PATCH] of: fix linker-section match-table corruption" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 5812b32e01c6d86ba7a84110702b46d8a8531fe9 Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan(a)kernel.org> Date: Mon, 23 Nov 2020 11:23:12 +0100 Subject: [PATCH] of: fix linker-section match-table corruption Specify type alignment when declaring linker-section match-table entries to prevent gcc from increasing alignment and corrupting the various tables with padding (e.g. timers, irqchips, clocks, reserved memory). This is specifically needed on x86 where gcc (typically) aligns larger objects like struct of_device_id with static extent on 32-byte boundaries which at best prevents matching on anything but the first entry. Specifying alignment when declaring variables suppresses this optimisation. Here's a 64-bit example where all entries are corrupt as 16 bytes of padding has been inserted before the first entry: ffffffff8266b4b0 D __clk_of_table ffffffff8266b4c0 d __of_table_fixed_factor_clk ffffffff8266b5a0 d __of_table_fixed_clk ffffffff8266b680 d __clk_of_table_sentinel And here's a 32-bit example where the 8-byte-aligned table happens to be placed on a 32-byte boundary so that all but the first entry are corrupt due to the 28 bytes of padding inserted between entries: 812b3ec0 D __irqchip_of_table 812b3ec0 d __of_table_irqchip1 812b3fa0 d __of_table_irqchip2 812b4080 d __of_table_irqchip3 812b4160 d irqchip_of_match_end Verified on x86 using gcc-9.3 and gcc-4.9 (which uses 64-byte alignment), and on arm using gcc-7.2. Note that there are no in-tree users of these tables on x86 currently (even if they are included in the image). Fixes: 54196ccbe0ba ("of: consolidate linker section OF match table declarations") Fixes: f6e916b82022 ("irqchip: add basic infrastructure") Cc: stable <stable(a)vger.kernel.org> # 3.9 Signed-off-by: Johan Hovold <johan(a)kernel.org> Link: https://lore.kernel.org/r/20201123102319.8090-2-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/include/linux/of.h b/include/linux/of.h index 5d51891cbf1a..af655d264f10 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -1300,6 +1300,7 @@ static inline int of_get_available_child_count(const struct device_node *np) #define _OF_DECLARE(table, name, compat, fn, fn_type) \ static const struct of_device_id __of_table_##name \ __used __section("__" #table "_of_table") \ + __aligned(__alignof__(struct of_device_id)) \ = { .compatible = compat, \ .data = (fn == (fn_type)NULL) ? fn : fn } #else

5 years

1
0
0 0

[PATCH 4.14] mm: memcontrol: fix excessive complexity in memory.stat reporting

by Shaoying Xu

From: Johannes Weiner <hannes(a)cmpxchg.org> [ Upstream commit a983b5ebee57209c99f68c8327072f25e0e6e3da ] We've seen memory.stat reads in top-level cgroups take up to fourteen seconds during a userspace bug that created tens of thousands of ghost cgroups pinned by lingering page cache. Even with a more reasonable number of cgroups, aggregating memory.stat is unnecessarily heavy. The complexity is this: nr_cgroups * nr_stat_items * nr_possible_cpus where the stat items are ~70 at this point. With 128 cgroups and 128 CPUs - decent, not enormous setups - reading the top-level memory.stat has to aggregate over a million per-cpu counters. This doesn't scale. Instead of spreading the source of truth across all CPUs, use the per-cpu counters merely to batch updates to shared atomic counters. This is the same as the per-cpu stocks we use for charging memory to the shared atomic page_counters, and also the way the global vmstat counters are implemented. Vmstat has elaborate spilling thresholds that depend on the number of CPUs, amount of memory, and memory pressure - carefully balancing the cost of counter updates with the amount of per-cpu error. That's because the vmstat counters are system-wide, but also used for decisions inside the kernel (e.g. NR_FREE_PAGES in the allocator). Neither is true for the memory controller. Use the same static batch size we already use for page_counter updates during charging. The per-cpu error in the stats will be 128k, which is an acceptable ratio of cores to memory accounting granularity. [hannes(a)cmpxchg.org: fix warning in __this_cpu_xchg() calls] Link: http://lkml.kernel.org/r/20171201135750.GB8097@cmpxchg.org Link: http://lkml.kernel.org/r/20171103153336.24044-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov.dev(a)gmail.com> Cc: Michal Hocko <mhocko(a)suse.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: stable(a)vger.kernel.org c9019e9: mm: memcontrol: eliminate raw access to stat and event counters Cc: stable(a)vger.kernel.org 2845426: mm: memcontrol: implement lruvec stat functions on top of each other Cc: stable(a)vger.kernel.org [shaoyi(a)amazon.com: resolved the conflict brought by commit 17ffa29c355658c8e9b19f56cbf0388500ca7905 in mm/memcontrol.c by contextual fix] Signed-off-by: Shaoying Xu <shaoyi(a)amazon.com> --- The excessive complexity in memory.stat reporting was fixed in v4.16 but didn't appear to make it to 4.14 stable. When backporting this patch, there is a small conflict brought by commit 17ffa29c355658c8e9b19f56cbf0388500ca7905 within free_mem_cgroup_per_node_info() of mm/memcontrol.c and can be resolved by contextual fix. include/linux/memcontrol.h | 96 +++++++++++++++++++++++++++--------------- mm/memcontrol.c | 101 +++++++++++++++++++++++---------------------- 2 files changed, 113 insertions(+), 84 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 1ffc54ac4cc9..882046863581 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -108,7 +108,10 @@ struct lruvec_stat { */ struct mem_cgroup_per_node { struct lruvec lruvec; - struct lruvec_stat __percpu *lruvec_stat; + + struct lruvec_stat __percpu *lruvec_stat_cpu; + atomic_long_t lruvec_stat[NR_VM_NODE_STAT_ITEMS]; + unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; struct mem_cgroup_reclaim_iter iter[DEF_PRIORITY + 1]; @@ -227,10 +230,10 @@ struct mem_cgroup { spinlock_t move_lock; struct task_struct *move_lock_task; unsigned long move_lock_flags; - /* - * percpu counter. - */ - struct mem_cgroup_stat_cpu __percpu *stat; + + struct mem_cgroup_stat_cpu __percpu *stat_cpu; + atomic_long_t stat[MEMCG_NR_STAT]; + atomic_long_t events[MEMCG_NR_EVENTS]; unsigned long socket_pressure; @@ -265,6 +268,12 @@ struct mem_cgroup { /* WARNING: nodeinfo must be the last member here */ }; +/* + * size of first charge trial. "32" comes from vmscan.c's magic value. + * TODO: maybe necessary to use big numbers in big irons. + */ +#define MEMCG_CHARGE_BATCH 32U + extern struct mem_cgroup *root_mem_cgroup; static inline bool mem_cgroup_disabled(void) @@ -485,32 +494,38 @@ void unlock_page_memcg(struct page *page); static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx) { - long val = 0; - int cpu; - - for_each_possible_cpu(cpu) - val += per_cpu(memcg->stat->count[idx], cpu); - - if (val < 0) - val = 0; - - return val; + long x = atomic_long_read(&memcg->stat[idx]); +#ifdef CONFIG_SMP + if (x < 0) + x = 0; +#endif + return x; } /* idx can be of type enum memcg_stat_item or node_stat_item */ static inline void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val) { - if (!mem_cgroup_disabled()) - __this_cpu_add(memcg->stat->count[idx], val); + long x; + + if (mem_cgroup_disabled()) + return; + + x = val + __this_cpu_read(memcg->stat_cpu->count[idx]); + if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { + atomic_long_add(x, &memcg->stat[idx]); + x = 0; + } + __this_cpu_write(memcg->stat_cpu->count[idx], x); } /* idx can be of type enum memcg_stat_item or node_stat_item */ static inline void mod_memcg_state(struct mem_cgroup *memcg, int idx, int val) { - if (!mem_cgroup_disabled()) - this_cpu_add(memcg->stat->count[idx], val); + preempt_disable(); + __mod_memcg_state(memcg, idx, val); + preempt_enable(); } /** @@ -548,26 +563,25 @@ static inline unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx) { struct mem_cgroup_per_node *pn; - long val = 0; - int cpu; + long x; if (mem_cgroup_disabled()) return node_page_state(lruvec_pgdat(lruvec), idx); pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); - for_each_possible_cpu(cpu) - val += per_cpu(pn->lruvec_stat->count[idx], cpu); - - if (val < 0) - val = 0; - - return val; + x = atomic_long_read(&pn->lruvec_stat[idx]); +#ifdef CONFIG_SMP + if (x < 0) + x = 0; +#endif + return x; } static inline void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val) { struct mem_cgroup_per_node *pn; + long x; /* Update node */ __mod_node_page_state(lruvec_pgdat(lruvec), idx, val); @@ -581,7 +595,12 @@ static inline void __mod_lruvec_state(struct lruvec *lruvec, __mod_memcg_state(pn->memcg, idx, val); /* Update lruvec */ - __this_cpu_add(pn->lruvec_stat->count[idx], val); + x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]); + if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { + atomic_long_add(x, &pn->lruvec_stat[idx]); + x = 0; + } + __this_cpu_write(pn->lruvec_stat_cpu->count[idx], x); } static inline void mod_lruvec_state(struct lruvec *lruvec, @@ -624,16 +643,25 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, static inline void __count_memcg_events(struct mem_cgroup *memcg, int idx, unsigned long count) { - if (!mem_cgroup_disabled()) - __this_cpu_add(memcg->stat->events[idx], count); + unsigned long x; + + if (mem_cgroup_disabled()) + return; + + x = count + __this_cpu_read(memcg->stat_cpu->events[idx]); + if (unlikely(x > MEMCG_CHARGE_BATCH)) { + atomic_long_add(x, &memcg->events[idx]); + x = 0; + } + __this_cpu_write(memcg->stat_cpu->events[idx], x); } -/* idx can be of type enum memcg_event_item or vm_event_item */ static inline void count_memcg_events(struct mem_cgroup *memcg, int idx, unsigned long count) { - if (!mem_cgroup_disabled()) - this_cpu_add(memcg->stat->events[idx], count); + preempt_disable(); + __count_memcg_events(memcg, idx, count); + preempt_enable(); } /* idx can be of type enum memcg_event_item or vm_event_item */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index eba9dc4795b5..4e763cdccb33 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -542,39 +542,10 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) return mz; } -/* - * Return page count for single (non recursive) @memcg. - * - * Implementation Note: reading percpu statistics for memcg. - * - * Both of vmstat[] and percpu_counter has threshold and do periodic - * synchronization to implement "quick" read. There are trade-off between - * reading cost and precision of value. Then, we may have a chance to implement - * a periodic synchronization of counter in memcg's counter. - * - * But this _read() function is used for user interface now. The user accounts - * memory usage by memory cgroup and he _always_ requires exact value because - * he accounts memory. Even if we provide quick-and-fuzzy read, we always - * have to visit all online cpus and make sum. So, for now, unnecessary - * synchronization is not implemented. (just implemented for cpu hotplug) - * - * If there are kernel internal actions which can make use of some not-exact - * value, and reading all cpu value can be performance bottleneck in some - * common workload, threshold and synchronization as vmstat[] should be - * implemented. - * - * The parameter idx can be of type enum memcg_event_item or vm_event_item. - */ - static unsigned long memcg_sum_events(struct mem_cgroup *memcg, int event) { - unsigned long val = 0; - int cpu; - - for_each_possible_cpu(cpu) - val += per_cpu(memcg->stat->events[event], cpu); - return val; + return atomic_long_read(&memcg->events[event]); } static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, @@ -606,7 +577,7 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, nr_pages = -nr_pages; /* for event */ } - __this_cpu_add(memcg->stat->nr_page_events, nr_pages); + __this_cpu_add(memcg->stat_cpu->nr_page_events, nr_pages); } unsigned long mem_cgroup_node_nr_lru_pages(struct mem_cgroup *memcg, @@ -642,8 +613,8 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, { unsigned long val, next; - val = __this_cpu_read(memcg->stat->nr_page_events); - next = __this_cpu_read(memcg->stat->targets[target]); + val = __this_cpu_read(memcg->stat_cpu->nr_page_events); + next = __this_cpu_read(memcg->stat_cpu->targets[target]); /* from time_after() in jiffies.h */ if ((long)(next - val) < 0) { switch (target) { @@ -659,7 +630,7 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, default: break; } - __this_cpu_write(memcg->stat->targets[target], next); + __this_cpu_write(memcg->stat_cpu->targets[target], next); return true; } return false; @@ -1726,11 +1697,6 @@ void unlock_page_memcg(struct page *page) } EXPORT_SYMBOL(unlock_page_memcg); -/* - * size of first charge trial. "32" comes from vmscan.c's magic value. - * TODO: maybe necessary to use big numbers in big irons. - */ -#define CHARGE_BATCH 32U struct memcg_stock_pcp { struct mem_cgroup *cached; /* this never be root cgroup */ unsigned int nr_pages; @@ -1758,7 +1724,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) unsigned long flags; bool ret = false; - if (nr_pages > CHARGE_BATCH) + if (nr_pages > MEMCG_CHARGE_BATCH) return ret; local_irq_save(flags); @@ -1827,7 +1793,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) } stock->nr_pages += nr_pages; - if (stock->nr_pages > CHARGE_BATCH) + if (stock->nr_pages > MEMCG_CHARGE_BATCH) drain_stock(stock); local_irq_restore(flags); @@ -1877,9 +1843,44 @@ static void drain_all_stock(struct mem_cgroup *root_memcg) static int memcg_hotplug_cpu_dead(unsigned int cpu) { struct memcg_stock_pcp *stock; + struct mem_cgroup *memcg; stock = &per_cpu(memcg_stock, cpu); drain_stock(stock); + + for_each_mem_cgroup(memcg) { + int i; + + for (i = 0; i < MEMCG_NR_STAT; i++) { + int nid; + long x; + + x = this_cpu_xchg(memcg->stat_cpu->count[i], 0); + if (x) + atomic_long_add(x, &memcg->stat[i]); + + if (i >= NR_VM_NODE_STAT_ITEMS) + continue; + + for_each_node(nid) { + struct mem_cgroup_per_node *pn; + + pn = mem_cgroup_nodeinfo(memcg, nid); + x = this_cpu_xchg(pn->lruvec_stat_cpu->count[i], 0); + if (x) + atomic_long_add(x, &pn->lruvec_stat[i]); + } + } + + for (i = 0; i < MEMCG_NR_EVENTS; i++) { + long x; + + x = this_cpu_xchg(memcg->stat_cpu->events[i], 0); + if (x) + atomic_long_add(x, &memcg->events[i]); + } + } + return 0; } @@ -1900,7 +1901,7 @@ static void high_work_func(struct work_struct *work) struct mem_cgroup *memcg; memcg = container_of(work, struct mem_cgroup, high_work); - reclaim_high(memcg, CHARGE_BATCH, GFP_KERNEL); + reclaim_high(memcg, MEMCG_CHARGE_BATCH, GFP_KERNEL); } /* @@ -1924,7 +1925,7 @@ void mem_cgroup_handle_over_high(void) static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages) { - unsigned int batch = max(CHARGE_BATCH, nr_pages); + unsigned int batch = max(MEMCG_CHARGE_BATCH, nr_pages); int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; struct mem_cgroup *mem_over_limit; struct page_counter *counter; @@ -4203,8 +4204,8 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) if (!pn) return 1; - pn->lruvec_stat = alloc_percpu(struct lruvec_stat); - if (!pn->lruvec_stat) { + pn->lruvec_stat_cpu = alloc_percpu(struct lruvec_stat); + if (!pn->lruvec_stat_cpu) { kfree(pn); return 1; } @@ -4225,7 +4226,7 @@ static void free_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) if (!pn) return; - free_percpu(pn->lruvec_stat); + free_percpu(pn->lruvec_stat_cpu); kfree(pn); } @@ -4235,7 +4236,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg) for_each_node(node) free_mem_cgroup_per_node_info(memcg, node); - free_percpu(memcg->stat); + free_percpu(memcg->stat_cpu); kfree(memcg); } @@ -4264,8 +4265,8 @@ static struct mem_cgroup *mem_cgroup_alloc(void) if (memcg->id.id < 0) goto fail; - memcg->stat = alloc_percpu(struct mem_cgroup_stat_cpu); - if (!memcg->stat) + memcg->stat_cpu = alloc_percpu(struct mem_cgroup_stat_cpu); + if (!memcg->stat_cpu) goto fail; for_each_node(node) @@ -5686,7 +5687,7 @@ static void uncharge_batch(const struct uncharge_gather *ug) __mod_memcg_state(ug->memcg, MEMCG_RSS_HUGE, -ug->nr_huge); __mod_memcg_state(ug->memcg, NR_SHMEM, -ug->nr_shmem); __count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout); - __this_cpu_add(ug->memcg->stat->nr_page_events, nr_pages); + __this_cpu_add(ug->memcg->stat_cpu->nr_page_events, nr_pages); memcg_check_events(ug->memcg, ug->dummy_page); local_irq_restore(flags); -- 2.16.6

5 years

2
1
0 0

FAILED: patch "[PATCH] xenbus/xenbus_backend: Disallow pending watch messages" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 9996bd494794a2fe393e97e7a982388c6249aa76 Mon Sep 17 00:00:00 2001 From: SeongJae Park <sjpark(a)amazon.de> Date: Mon, 14 Dec 2020 10:08:40 +0100 Subject: [PATCH] xenbus/xenbus_backend: Disallow pending watch messages 'xenbus_backend' watches 'state' of devices, which is writable by guests. Hence, if guests intensively updates it, dom0 will have lots of pending events that exhausting memory of dom0. In other words, guests can trigger dom0 memory pressure. This is known as XSA-349. However, the watch callback of it, 'frontend_changed()', reads only 'state', so doesn't need to have the pending events. To avoid the problem, this commit disallows pending watch messages for 'xenbus_backend' using the 'will_handle()' watch callback. This is part of XSA-349 Cc: stable(a)vger.kernel.org Signed-off-by: SeongJae Park <sjpark(a)amazon.de> Reported-by: Michael Kurth <mku(a)amazon.de> Reported-by: Pawel Wieczorkiewicz <wipawel(a)amazon.de> Reviewed-by: Juergen Gross <jgross(a)suse.com> Signed-off-by: Juergen Gross <jgross(a)suse.com> diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index 2ba699897e6d..5abded97e1a7 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -180,6 +180,12 @@ static int xenbus_probe_backend(struct xen_bus_type *bus, const char *type, return err; } +static bool frontend_will_handle(struct xenbus_watch *watch, + const char *path, const char *token) +{ + return watch->nr_pending == 0; +} + static void frontend_changed(struct xenbus_watch *watch, const char *path, const char *token) { @@ -191,6 +197,7 @@ static struct xen_bus_type xenbus_backend = { .levels = 3, /* backend/type/<frontend>/<id> */ .get_bus_id = backend_bus_id, .probe = xenbus_probe_backend, + .otherend_will_handle = frontend_will_handle, .otherend_changed = frontend_changed, .bus = { .name = "xen-backend",

5 years

1
0
0 0

FAILED: patch "[PATCH] xen/xenbus: Count pending messages for each watch" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 3dc86ca6b4c8cfcba9da7996189d1b5a358a94fc Mon Sep 17 00:00:00 2001 From: SeongJae Park <sjpark(a)amazon.de> Date: Mon, 14 Dec 2020 10:07:13 +0100 Subject: [PATCH] xen/xenbus: Count pending messages for each watch This commit adds a counter of pending messages for each watch in the struct. It is used to skip unnecessary pending messages lookup in 'unregister_xenbus_watch()'. It could also be used in 'will_handle' callback. This is part of XSA-349 Cc: stable(a)vger.kernel.org Signed-off-by: SeongJae Park <sjpark(a)amazon.de> Reported-by: Michael Kurth <mku(a)amazon.de> Reported-by: Pawel Wieczorkiewicz <wipawel(a)amazon.de> Reviewed-by: Juergen Gross <jgross(a)suse.com> Signed-off-by: Juergen Gross <jgross(a)suse.com> diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c index e8bdbd0a1e26..12e02eb01f59 100644 --- a/drivers/xen/xenbus/xenbus_xs.c +++ b/drivers/xen/xenbus/xenbus_xs.c @@ -711,6 +711,7 @@ int xs_watch_msg(struct xs_watch_event *event) event->path, event->token))) { spin_lock(&watch_events_lock); list_add_tail(&event->list, &watch_events); + event->handle->nr_pending++; wake_up(&watch_events_waitq); spin_unlock(&watch_events_lock); } else @@ -768,6 +769,8 @@ int register_xenbus_watch(struct xenbus_watch *watch) sprintf(token, "%lX", (long)watch); + watch->nr_pending = 0; + down_read(&xs_watch_rwsem); spin_lock(&watches_lock); @@ -817,11 +820,14 @@ void unregister_xenbus_watch(struct xenbus_watch *watch) /* Cancel pending watch events. */ spin_lock(&watch_events_lock); - list_for_each_entry_safe(event, tmp, &watch_events, list) { - if (event->handle != watch) - continue; - list_del(&event->list); - kfree(event); + if (watch->nr_pending) { + list_for_each_entry_safe(event, tmp, &watch_events, list) { + if (event->handle != watch) + continue; + list_del(&event->list); + kfree(event); + } + watch->nr_pending = 0; } spin_unlock(&watch_events_lock); @@ -868,7 +874,6 @@ void xs_suspend_cancel(void) static int xenwatch_thread(void *unused) { - struct list_head *ent; struct xs_watch_event *event; xenwatch_pid = current->pid; @@ -883,13 +888,15 @@ static int xenwatch_thread(void *unused) mutex_lock(&xenwatch_mutex); spin_lock(&watch_events_lock); - ent = watch_events.next; - if (ent != &watch_events) - list_del(ent); + event = list_first_entry_or_null(&watch_events, + struct xs_watch_event, list); + if (event) { + list_del(&event->list); + event->handle->nr_pending--; + } spin_unlock(&watch_events_lock); - if (ent != &watch_events) { - event = list_entry(ent, struct xs_watch_event, list); + if (event) { event->handle->callback(event->handle, event->path, event->token); kfree(event); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index c8574d1b814c..00c7235ae93e 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -61,6 +61,8 @@ struct xenbus_watch /* Path being watched. */ const char *node; + unsigned int nr_pending; + /* * Called just before enqueing new event while a spinlock is held. * The event will be discarded if this callback returns false.

5 years

1
0
0 0

FAILED: patch "[PATCH] xen/xenbus/xen_bus_type: Support will_handle watch callback" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From be987200fbaceaef340872841d4f7af2c5ee8dc3 Mon Sep 17 00:00:00 2001 From: SeongJae Park <sjpark(a)amazon.de> Date: Mon, 14 Dec 2020 10:05:47 +0100 Subject: [PATCH] xen/xenbus/xen_bus_type: Support will_handle watch callback This commit adds support of the 'will_handle' watch callback for 'xen_bus_type' users. This is part of XSA-349 Cc: stable(a)vger.kernel.org Signed-off-by: SeongJae Park <sjpark(a)amazon.de> Reported-by: Michael Kurth <mku(a)amazon.de> Reported-by: Pawel Wieczorkiewicz <wipawel(a)amazon.de> Reviewed-by: Juergen Gross <jgross(a)suse.com> Signed-off-by: Juergen Gross <jgross(a)suse.com> diff --git a/drivers/xen/xenbus/xenbus.h b/drivers/xen/xenbus/xenbus.h index 5f5b8a7d5b80..2a93b7c9c159 100644 --- a/drivers/xen/xenbus/xenbus.h +++ b/drivers/xen/xenbus/xenbus.h @@ -44,6 +44,8 @@ struct xen_bus_type { int (*get_bus_id)(char bus_id[XEN_BUS_ID_SIZE], const char *nodename); int (*probe)(struct xen_bus_type *bus, const char *type, const char *dir); + bool (*otherend_will_handle)(struct xenbus_watch *watch, + const char *path, const char *token); void (*otherend_changed)(struct xenbus_watch *watch, const char *path, const char *token); struct bus_type bus; diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 4c3d1b84aa0a..44634d970a5c 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -136,7 +136,8 @@ static int watch_otherend(struct xenbus_device *dev) container_of(dev->dev.bus, struct xen_bus_type, bus); return xenbus_watch_pathfmt(dev, &dev->otherend_watch, - NULL, bus->otherend_changed, + bus->otherend_will_handle, + bus->otherend_changed, "%s/%s", dev->otherend, "state"); }

5 years

1
0
0 0

FAILED: patch "[PATCH] xen/xenbus: Add 'will_handle' callback support in" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 2e85d32b1c865bec703ce0c962221a5e955c52c2 Mon Sep 17 00:00:00 2001 From: SeongJae Park <sjpark(a)amazon.de> Date: Mon, 14 Dec 2020 10:04:18 +0100 Subject: [PATCH] xen/xenbus: Add 'will_handle' callback support in xenbus_watch_path() Some code does not directly make 'xenbus_watch' object and call 'register_xenbus_watch()' but use 'xenbus_watch_path()' instead. This commit adds support of 'will_handle' callback in the 'xenbus_watch_path()' and it's wrapper, 'xenbus_watch_pathfmt()'. This is part of XSA-349 Cc: stable(a)vger.kernel.org Signed-off-by: SeongJae Park <sjpark(a)amazon.de> Reported-by: Michael Kurth <mku(a)amazon.de> Reported-by: Pawel Wieczorkiewicz <wipawel(a)amazon.de> Reviewed-by: Juergen Gross <jgross(a)suse.com> Signed-off-by: Juergen Gross <jgross(a)suse.com> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 76912c584a76..1d8b8d24496c 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -675,7 +675,8 @@ static int xen_blkbk_probe(struct xenbus_device *dev, /* setup back pointer */ be->blkif->be = be; - err = xenbus_watch_pathfmt(dev, &be->backend_watch, backend_changed, + err = xenbus_watch_pathfmt(dev, &be->backend_watch, NULL, + backend_changed, "%s/%s", dev->nodename, "physical-device"); if (err) goto fail; diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c index 00f6f8dc56c8..6f10e0998f1c 100644 --- a/drivers/net/xen-netback/xenbus.c +++ b/drivers/net/xen-netback/xenbus.c @@ -824,7 +824,7 @@ static void connect(struct backend_info *be) xenvif_carrier_on(be->vif); unregister_hotplug_status_watch(be); - err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch, + err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch, NULL, hotplug_status_changed, "%s/%s", dev->nodename, "hotplug-status"); if (!err) diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-pciback/xenbus.c index 4b99ec3dec58..e7c692cfb2cf 100644 --- a/drivers/xen/xen-pciback/xenbus.c +++ b/drivers/xen/xen-pciback/xenbus.c @@ -689,7 +689,7 @@ static int xen_pcibk_xenbus_probe(struct xenbus_device *dev, /* watch the backend node for backend configuration information */ err = xenbus_watch_path(dev, dev->nodename, &pdev->be_watch, - xen_pcibk_be_watch); + NULL, xen_pcibk_be_watch); if (err) goto out; diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c index 0a21a12d9c34..0cd728961fce 100644 --- a/drivers/xen/xenbus/xenbus_client.c +++ b/drivers/xen/xenbus/xenbus_client.c @@ -127,19 +127,22 @@ EXPORT_SYMBOL_GPL(xenbus_strstate); */ int xenbus_watch_path(struct xenbus_device *dev, const char *path, struct xenbus_watch *watch, + bool (*will_handle)(struct xenbus_watch *, + const char *, const char *), void (*callback)(struct xenbus_watch *, const char *, const char *)) { int err; watch->node = path; - watch->will_handle = NULL; + watch->will_handle = will_handle; watch->callback = callback; err = register_xenbus_watch(watch); if (err) { watch->node = NULL; + watch->will_handle = NULL; watch->callback = NULL; xenbus_dev_fatal(dev, err, "adding watch on %s", path); } @@ -166,6 +169,8 @@ EXPORT_SYMBOL_GPL(xenbus_watch_path); */ int xenbus_watch_pathfmt(struct xenbus_device *dev, struct xenbus_watch *watch, + bool (*will_handle)(struct xenbus_watch *, + const char *, const char *), void (*callback)(struct xenbus_watch *, const char *, const char *), const char *pathfmt, ...) @@ -182,7 +187,7 @@ int xenbus_watch_pathfmt(struct xenbus_device *dev, xenbus_dev_fatal(dev, -ENOMEM, "allocating path for watch"); return -ENOMEM; } - err = xenbus_watch_path(dev, path, watch, callback); + err = xenbus_watch_path(dev, path, watch, will_handle, callback); if (err) kfree(path); diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 38725d97d909..4c3d1b84aa0a 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -136,7 +136,7 @@ static int watch_otherend(struct xenbus_device *dev) container_of(dev->dev.bus, struct xen_bus_type, bus); return xenbus_watch_pathfmt(dev, &dev->otherend_watch, - bus->otherend_changed, + NULL, bus->otherend_changed, "%s/%s", dev->otherend, "state"); } diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index baa88bf0b9bc..c8574d1b814c 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -204,10 +204,14 @@ void xenbus_probe(struct work_struct *); int xenbus_watch_path(struct xenbus_device *dev, const char *path, struct xenbus_watch *watch, + bool (*will_handle)(struct xenbus_watch *, + const char *, const char *), void (*callback)(struct xenbus_watch *, const char *, const char *)); -__printf(4, 5) +__printf(5, 6) int xenbus_watch_pathfmt(struct xenbus_device *dev, struct xenbus_watch *watch, + bool (*will_handle)(struct xenbus_watch *, + const char *, const char *), void (*callback)(struct xenbus_watch *, const char *, const char *), const char *pathfmt, ...);

5 years

1
0
0 0

FAILED: patch "[PATCH] xen-blkback: set ring->xenblkd to NULL after kthread_stop()" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 1c728719a4da6e654afb9cc047164755072ed7c9 Mon Sep 17 00:00:00 2001 From: Pawel Wieczorkiewicz <wipawel(a)amazon.de> Date: Mon, 14 Dec 2020 10:25:57 +0100 Subject: [PATCH] xen-blkback: set ring->xenblkd to NULL after kthread_stop() When xen_blkif_disconnect() is called, the kernel thread behind the block interface is stopped by calling kthread_stop(ring->xenblkd). The ring->xenblkd thread pointer being non-NULL determines if the thread has been already stopped. Normally, the thread's function xen_blkif_schedule() sets the ring->xenblkd to NULL, when the thread's main loop ends. However, when the thread has not been started yet (i.e. wake_up_process() has not been called on it), the xen_blkif_schedule() function would not be called yet. In such case the kthread_stop() call returns -EINTR and the ring->xenblkd remains dangling. When this happens, any consecutive call to xen_blkif_disconnect (for example in frontend_changed() callback) leads to a kernel crash in kthread_stop() (e.g. NULL pointer dereference in exit_creds()). This is XSA-350. Cc: <stable(a)vger.kernel.org> # 4.12 Fixes: a24fa22ce22a ("xen/blkback: don't use xen_blkif_get() in xen-blkback kthread") Reported-by: Olivier Benjamin <oliben(a)amazon.com> Reported-by: Pawel Wieczorkiewicz <wipawel(a)amazon.de> Signed-off-by: Pawel Wieczorkiewicz <wipawel(a)amazon.de> Reviewed-by: Julien Grall <jgrall(a)amazon.com> Reviewed-by: Juergen Gross <jgross(a)suse.com> Signed-off-by: Juergen Gross <jgross(a)suse.com> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 1d8b8d24496c..9860d4842f36 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -274,6 +274,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif) if (ring->xenblkd) { kthread_stop(ring->xenblkd); + ring->xenblkd = NULL; wake_up(&ring->shutdown_wq); }

5 years

1
0
0 0

FAILED: patch "[PATCH] dma-buf/dma-resv: Respect num_fences when initializing the" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From bf8975837dac156c33a4d15d46602700998cb6dd Mon Sep 17 00:00:00 2001 From: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Date: Tue, 24 Nov 2020 12:57:07 +0100 Subject: [PATCH] dma-buf/dma-resv: Respect num_fences when initializing the shared fence list. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We hardcode the maximum number of shared fences to 4, instead of respecting num_fences. Use a minimum of 4, but more if num_fences is higher. This seems to have been an oversight when first implementing the api. Fixes: 04a5faa8cbe5 ("reservation: update api and add some helpers") Cc: <stable(a)vger.kernel.org> # v3.17+ Reported-by: Niranjana Vishwanathapura <niranjana.vishwanathapura(a)intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201124115707.406917-1-maart… diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index bb5a42b10c29..6ddbeb5dfbf6 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -200,7 +200,7 @@ int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences) max = max(old->shared_count + num_fences, old->shared_max * 2); } else { - max = 4; + max = max(4ul, roundup_pow_of_two(num_fences)); } new = dma_resv_list_alloc(max);

5 years

1
0
0 0

FAILED: patch "[PATCH] dma-buf/dma-resv: Respect num_fences when initializing the" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From bf8975837dac156c33a4d15d46602700998cb6dd Mon Sep 17 00:00:00 2001 From: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Date: Tue, 24 Nov 2020 12:57:07 +0100 Subject: [PATCH] dma-buf/dma-resv: Respect num_fences when initializing the shared fence list. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We hardcode the maximum number of shared fences to 4, instead of respecting num_fences. Use a minimum of 4, but more if num_fences is higher. This seems to have been an oversight when first implementing the api. Fixes: 04a5faa8cbe5 ("reservation: update api and add some helpers") Cc: <stable(a)vger.kernel.org> # v3.17+ Reported-by: Niranjana Vishwanathapura <niranjana.vishwanathapura(a)intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201124115707.406917-1-maart… diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index bb5a42b10c29..6ddbeb5dfbf6 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -200,7 +200,7 @@ int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences) max = max(old->shared_count + num_fences, old->shared_max * 2); } else { - max = 4; + max = max(4ul, roundup_pow_of_two(num_fences)); } new = dma_resv_list_alloc(max);

5 years

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror December 2020