--------------------
Note, this will be the LAST 5.14.y kernel release. After this one it
will be marked end-of-life. Please move to the 5.15.y tree at this
point in time.
--------------------
This is the start of the stable review cycle for the 5.14.21 release.
There are 15 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 21 Nov 2021 17:14:35 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.14.21-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.14.21-rc1
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Revert "ACPI: scan: Release PM resources blocked by unused objects"
Subbaraman Narayanamurthy <quic_subbaram(a)quicinc.com>
thermal: Fix NULL pointer dereferences in of_thermal_ functions
Greg Thelen <gthelen(a)google.com>
perf/core: Avoid put_page() when GUP fails
Marc Zyngier <maz(a)kernel.org>
PCI: Add MSI masking quirk for Nvidia ION AHCI
Marc Zyngier <maz(a)kernel.org>
PCI/MSI: Deal with devices lying about their MSI mask capability
Thomas Gleixner <tglx(a)linutronix.de>
PCI/MSI: Destroy sysfs before freeing entries
Sven Schnelle <svens(a)stackframe.org>
parisc/entry: fix trace test in syscall exit path
Nicholas Flintham <nick(a)flinny.org>
Bluetooth: btusb: Add support for TP-Link UB500 Adapter
Masami Hiramatsu <mhiramat(a)kernel.org>
bootconfig: init: Fix memblock leak in xbc_make_cmdline()
Xie Yongji <xieyongji(a)bytedance.com>
loop: Use blk_validate_block_size() to validate block size
Xie Yongji <xieyongji(a)bytedance.com>
block: Add a helper to validate the block size
Kees Cook <keescook(a)chromium.org>
fortify: Explicitly disable Clang support
David Woodhouse <dwmw(a)amazon.co.uk>
KVM: Fix steal time asm constraints
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "drm: fb_helper: fix CONFIG_FB dependency"
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "drm: fb_helper: improve CONFIG_FB dependency"
-------------
Diffstat:
Makefile | 4 ++--
arch/parisc/kernel/entry.S | 2 +-
arch/x86/kvm/x86.c | 6 +++---
drivers/acpi/glue.c | 25 -------------------------
drivers/acpi/internal.h | 1 -
drivers/acpi/scan.c | 6 ------
drivers/block/loop.c | 17 ++---------------
drivers/bluetooth/btusb.c | 4 ++++
drivers/gpu/drm/Kconfig | 5 +++--
drivers/pci/msi.c | 27 +++++++++++++++------------
drivers/pci/quirks.c | 6 ++++++
drivers/thermal/thermal_of.c | 9 ++++++---
include/linux/blkdev.h | 8 ++++++++
include/linux/pci.h | 2 ++
init/main.c | 1 +
kernel/events/core.c | 10 +++++-----
security/Kconfig | 3 +++
17 files changed, 61 insertions(+), 75 deletions(-)
Hi,
I went through all changes in batman-adv since v4.19 with a Fixes: line
and checked whether they were backported to the LTS kernels. The ones which
weren't ported and applied to this branch are now part of this patch series.
There are also following three patches included:
* batman-adv: Consider fragmentation for needed_headroom
* batman-adv: Reserve needed_*room for fragments
* batman-adv: Don't always reallocate the fragmentation skb head
which could in some circumstances cause packet loss but which were created
to fix high CPU load/low throughput problems. But I've added them here
anyway because the corresponding VXLAN patches were also added to stable.
And some stable kernels also got these fixes a while back.
Kind regards,
Sven
Linus Lüssing (1):
batman-adv: mcast: fix duplicate mcast packets in BLA backbone from
LAN
Sven Eckelmann (3):
batman-adv: Consider fragmentation for needed_headroom
batman-adv: Reserve needed_*room for fragments
batman-adv: Don't always reallocate the fragmentation skb head
net/batman-adv/fragmentation.c | 26 ++++++++++++++++----------
net/batman-adv/hard-interface.c | 3 +++
net/batman-adv/multicast.c | 31 +++++++++++++++++++++++++++++++
net/batman-adv/multicast.h | 15 +++++++++++++++
net/batman-adv/soft-interface.c | 5 ++---
5 files changed, 67 insertions(+), 13 deletions(-)
--
2.30.2
Hi,
I went through all changes in batman-adv since v4.14 with a Fixes: line
and checked whether they were backported to the LTS kernels. The ones which
weren't ported and applied to this branch are now part of this patch series.
There are also following three patches included:
* batman-adv: Consider fragmentation for needed_headroom
* batman-adv: Reserve needed_*room for fragments
* batman-adv: Don't always reallocate the fragmentation skb head
which could in some circumstances cause packet loss but which were created
to fix high CPU load/low throughput problems. But I've added them here
anyway because the corresponding VXLAN patches were also added to stable.
And some stable kernels also got these fixes a while back.
Kind regards,
Sven
Linus Lüssing (2):
batman-adv: mcast: fix duplicate mcast packets in BLA backbone from
LAN
batman-adv: mcast: fix duplicate mcast packets from BLA backbone to
mesh
Sven Eckelmann (3):
batman-adv: Consider fragmentation for needed_headroom
batman-adv: Reserve needed_*room for fragments
batman-adv: Don't always reallocate the fragmentation skb head
net/batman-adv/bridge_loop_avoidance.c | 103 +++++++++++++++++++++----
net/batman-adv/fragmentation.c | 26 ++++---
net/batman-adv/hard-interface.c | 3 +
net/batman-adv/multicast.c | 31 ++++++++
net/batman-adv/multicast.h | 15 ++++
net/batman-adv/soft-interface.c | 5 +-
6 files changed, 154 insertions(+), 29 deletions(-)
--
2.30.2
Hi,
I went through all changes in batman-adv since v4.9 with a Fixes: line
and checked whether they were backported to the LTS kernels. The ones which
weren't ported and applied to this branch are now part of this patch series.
There are also following three patches included:
* batman-adv: Consider fragmentation for needed_headroom
* batman-adv: Reserve needed_*room for fragments
* batman-adv: Don't always reallocate the fragmentation skb head
which could in some circumstances cause packet loss but which were created
to fix high CPU load/low throughput problems. But I've added them here
anyway because the corresponding VXLAN patches were also added to stable.
And some stable kernels also got these fixes a while back.
Kind regards,
Sven
Linus Lüssing (3):
batman-adv: Fix own OGM check in aggregated OGMs
batman-adv: mcast: fix duplicate mcast packets in BLA backbone from
LAN
batman-adv: mcast: fix duplicate mcast packets from BLA backbone to
mesh
Sven Eckelmann (4):
batman-adv: Keep fragments equally sized
batman-adv: Consider fragmentation for needed_headroom
batman-adv: Reserve needed_*room for fragments
batman-adv: Don't always reallocate the fragmentation skb head
net/batman-adv/bat_v_ogm.c | 11 +--
net/batman-adv/bridge_loop_avoidance.c | 103 +++++++++++++++++++++----
net/batman-adv/fragmentation.c | 42 ++++++----
net/batman-adv/hard-interface.c | 3 +
net/batman-adv/multicast.c | 31 ++++++++
net/batman-adv/multicast.h | 15 ++++
net/batman-adv/soft-interface.c | 5 +-
7 files changed, 172 insertions(+), 38 deletions(-)
--
2.30.2
From: Ard Biesheuvel <ardb(a)kernel.org>
Subject: kmap_local: don't assume kmap PTEs are linear arrays in memory
The kmap_local conversion broke the ARM architecture, because the new code
assumes that all PTEs used for creating kmaps form a linear array in
memory, and uses array indexing to look up the kmap PTE belonging to a
certain kmap index.
On ARM, this cannot work, not only because the PTE pages may be
non-adjacent in memory, but also because ARM/!LPAE interleaves hardware
entries and extended entries (carrying software-only bits) in a way that
is not compatible with array indexing.
Fortunately, this only seems to affect configurations with more than 8
CPUs, due to the way the per-CPU kmap slots are organized in memory.
Work around this by permitting an architecture to set a Kconfig symbol
that signifies that the kmap PTEs do not form a lineary array in memory,
and so the only way to locate the appropriate one is to walk the page
tables.
Link: https://lore.kernel.org/linux-arm-kernel/20211026131249.3731275-1-ardb@kern…
Link: https://lkml.kernel.org/r/20211116094737.7391-1-ardb@kernel.org
Fixes: 2a15ba82fa6c ("ARM: highmem: Switch to generic kmap atomic")
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Reported-by: Quanyang Wang <quanyang.wang(a)windriver.com>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Acked-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/arm/Kconfig | 1 +
mm/Kconfig | 3 +++
mm/highmem.c | 32 +++++++++++++++++++++-----------
3 files changed, 25 insertions(+), 11 deletions(-)
--- a/arch/arm/Kconfig~kmap_local-dont-assume-kmap-ptes-are-linear-arrays-in-memory
+++ a/arch/arm/Kconfig
@@ -1463,6 +1463,7 @@ config HIGHMEM
bool "High Memory Support"
depends on MMU
select KMAP_LOCAL
+ select KMAP_LOCAL_NON_LINEAR_PTE_ARRAY
help
The address space of ARM processors is only 4 Gigabytes large
and it has to accommodate user address space, kernel address
--- a/mm/highmem.c~kmap_local-dont-assume-kmap-ptes-are-linear-arrays-in-memory
+++ a/mm/highmem.c
@@ -503,16 +503,22 @@ static inline int kmap_local_calc_idx(in
static pte_t *__kmap_pte;
-static pte_t *kmap_get_pte(void)
+static pte_t *kmap_get_pte(unsigned long vaddr, int idx)
{
+ if (IS_ENABLED(CONFIG_KMAP_LOCAL_NON_LINEAR_PTE_ARRAY))
+ /*
+ * Set by the arch if __kmap_pte[-idx] does not produce
+ * the correct entry.
+ */
+ return virt_to_kpte(vaddr);
if (!__kmap_pte)
__kmap_pte = virt_to_kpte(__fix_to_virt(FIX_KMAP_BEGIN));
- return __kmap_pte;
+ return &__kmap_pte[-idx];
}
void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t prot)
{
- pte_t pteval, *kmap_pte = kmap_get_pte();
+ pte_t pteval, *kmap_pte;
unsigned long vaddr;
int idx;
@@ -524,9 +530,10 @@ void *__kmap_local_pfn_prot(unsigned lon
preempt_disable();
idx = arch_kmap_local_map_idx(kmap_local_idx_push(), pfn);
vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
- BUG_ON(!pte_none(*(kmap_pte - idx)));
+ kmap_pte = kmap_get_pte(vaddr, idx);
+ BUG_ON(!pte_none(*kmap_pte));
pteval = pfn_pte(pfn, prot);
- arch_kmap_local_set_pte(&init_mm, vaddr, kmap_pte - idx, pteval);
+ arch_kmap_local_set_pte(&init_mm, vaddr, kmap_pte, pteval);
arch_kmap_local_post_map(vaddr, pteval);
current->kmap_ctrl.pteval[kmap_local_idx()] = pteval;
preempt_enable();
@@ -559,7 +566,7 @@ EXPORT_SYMBOL(__kmap_local_page_prot);
void kunmap_local_indexed(void *vaddr)
{
unsigned long addr = (unsigned long) vaddr & PAGE_MASK;
- pte_t *kmap_pte = kmap_get_pte();
+ pte_t *kmap_pte;
int idx;
if (addr < __fix_to_virt(FIX_KMAP_END) ||
@@ -584,8 +591,9 @@ void kunmap_local_indexed(void *vaddr)
idx = arch_kmap_local_unmap_idx(kmap_local_idx(), addr);
WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
+ kmap_pte = kmap_get_pte(addr, idx);
arch_kmap_local_pre_unmap(addr);
- pte_clear(&init_mm, addr, kmap_pte - idx);
+ pte_clear(&init_mm, addr, kmap_pte);
arch_kmap_local_post_unmap(addr);
current->kmap_ctrl.pteval[kmap_local_idx()] = __pte(0);
kmap_local_idx_pop();
@@ -607,7 +615,7 @@ EXPORT_SYMBOL(kunmap_local_indexed);
void __kmap_local_sched_out(void)
{
struct task_struct *tsk = current;
- pte_t *kmap_pte = kmap_get_pte();
+ pte_t *kmap_pte;
int i;
/* Clear kmaps */
@@ -634,8 +642,9 @@ void __kmap_local_sched_out(void)
idx = arch_kmap_local_map_idx(i, pte_pfn(pteval));
addr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+ kmap_pte = kmap_get_pte(addr, idx);
arch_kmap_local_pre_unmap(addr);
- pte_clear(&init_mm, addr, kmap_pte - idx);
+ pte_clear(&init_mm, addr, kmap_pte);
arch_kmap_local_post_unmap(addr);
}
}
@@ -643,7 +652,7 @@ void __kmap_local_sched_out(void)
void __kmap_local_sched_in(void)
{
struct task_struct *tsk = current;
- pte_t *kmap_pte = kmap_get_pte();
+ pte_t *kmap_pte;
int i;
/* Restore kmaps */
@@ -663,7 +672,8 @@ void __kmap_local_sched_in(void)
/* See comment in __kmap_local_sched_out() */
idx = arch_kmap_local_map_idx(i, pte_pfn(pteval));
addr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
- set_pte_at(&init_mm, addr, kmap_pte - idx, pteval);
+ kmap_pte = kmap_get_pte(addr, idx);
+ set_pte_at(&init_mm, addr, kmap_pte, pteval);
arch_kmap_local_post_map(addr, pteval);
}
}
--- a/mm/Kconfig~kmap_local-dont-assume-kmap-ptes-are-linear-arrays-in-memory
+++ a/mm/Kconfig
@@ -890,6 +890,9 @@ config MAPPING_DIRTY_HELPERS
config KMAP_LOCAL
bool
+config KMAP_LOCAL_NON_LINEAR_PTE_ARRAY
+ bool
+
# struct io_mapping based helper. Selected by drivers that need them
config IO_MAPPING
bool
_
From: Mina Almasry <almasrymina(a)google.com>
Subject: hugetlb, userfaultfd: fix reservation restore on userfaultfd error
Currently in the is_continue case in hugetlb_mcopy_atomic_pte(), if we
bail out using "goto out_release_unlock;" in the cases where idx >= size,
or !huge_pte_none(), the code will detect that new_pagecache_page ==
false, and so call restore_reserve_on_error(). In this case I see
restore_reserve_on_error() delete the reservation, and the following call
to remove_inode_hugepages() will increment h->resv_hugepages causing a
100% reproducible leak.
We should treat the is_continue case similar to adding a page into the
pagecache and set new_pagecache_page to true, to indicate that there is no
reservation to restore on the error path, and we need not call
restore_reserve_on_error(). Rename new_pagecache_page to
page_in_pagecache to make that clear.
Link: https://lkml.kernel.org/r/20211117193825.378528-1-almasrymina@google.com
Fixes: c7b1850dfb41 ("hugetlb: don't pass page cache pages to restore_reserve_on_error")
Signed-off-by: Mina Almasry <almasrymina(a)google.com>
Reported-by: James Houghton <jthoughton(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/mm/hugetlb.c~hugetlb-userfaultfd-fix-reservation-restore-on-userfaultfd-error
+++ a/mm/hugetlb.c
@@ -5736,13 +5736,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_s
int ret = -ENOMEM;
struct page *page;
int writable;
- bool new_pagecache_page = false;
+ bool page_in_pagecache = false;
if (is_continue) {
ret = -EFAULT;
page = find_lock_page(mapping, idx);
if (!page)
goto out;
+ page_in_pagecache = true;
} else if (!*pagep) {
/* If a page already exists, then it's UFFDIO_COPY for
* a non-missing case. Return -EEXIST.
@@ -5830,7 +5831,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_s
ret = huge_add_to_page_cache(page, mapping, idx);
if (ret)
goto out_release_nounlock;
- new_pagecache_page = true;
+ page_in_pagecache = true;
}
ptl = huge_pte_lockptr(h, dst_mm, dst_pte);
@@ -5894,7 +5895,7 @@ out_release_unlock:
if (vm_shared || is_continue)
unlock_page(page);
out_release_nounlock:
- if (!new_pagecache_page)
+ if (!page_in_pagecache)
restore_reserve_on_error(h, dst_vma, dst_addr, page);
put_page(page);
goto out;
_
From: Rustam Kovhaev <rkovhaev(a)gmail.com>
Subject: mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag
When kmemleak is enabled for SLOB, system does not boot and does not print
anything to the console. At the very early stage in the boot process we
hit infinite recursion from kmemleak_init() and eventually kernel crashes.
kmemleak_init() specifies SLAB_NOLEAKTRACE for KMEM_CACHE(), but
kmem_cache_create_usercopy() removes it because CACHE_CREATE_MASK is not
valid for SLOB.
Let's fix CACHE_CREATE_MASK and make kmemleak work with SLOB
Link: https://lkml.kernel.org/r/20211115020850.3154366-1-rkovhaev@gmail.com
Fixes: d8843922fba4 ("slab: Ignore internal flags in cache creation")
Signed-off-by: Rustam Kovhaev <rkovhaev(a)gmail.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: Muchun Song <songmuchun(a)bytedance.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Glauber Costa <glommer(a)parallels.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slab.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/slab.h~mm-kmemleak-slob-respect-slab_noleaktrace-flag
+++ a/mm/slab.h
@@ -147,7 +147,7 @@ static inline slab_flags_t kmem_cache_fl
#define SLAB_CACHE_FLAGS (SLAB_NOLEAKTRACE | SLAB_RECLAIM_ACCOUNT | \
SLAB_TEMPORARY | SLAB_ACCOUNT)
#else
-#define SLAB_CACHE_FLAGS (0)
+#define SLAB_CACHE_FLAGS (SLAB_NOLEAKTRACE)
#endif
/* Common flags available with current configuration */
_
From: Nathan Chancellor <nathan(a)kernel.org>
Subject: hexagon: clean up timer-regs.h
When building allmodconfig, there is a warning about TIMER_ENABLE being
redefined:
drivers/clocksource/timer-oxnas-rps.c:39:9: error: 'TIMER_ENABLE' macro redefined [-Werror,-Wmacro-redefined]
#define TIMER_ENABLE BIT(7)
^
./arch/hexagon/include/asm/timer-regs.h:13:9: note: previous definition is here
#define TIMER_ENABLE 0
^
1 error generated.
The values in this header are only used in one file each, if they are
used at all. Remove the header and sink all of the constants into their
respective files.
TCX0_CLK_RATE is only used in arch/hexagon/include/asm/timex.h
TIMER_ENABLE, RTOS_TIMER_INT, RTOS_TIMER_REGS_ADDR are only used in
arch/hexagon/kernel/time.c.
SLEEP_CLK_RATE and TIMER_CLR_ON_MATCH have both been unused since the
file's introduction in commit 71e4a47f32f4 ("Hexagon: Add time and timer
functions").
TIMER_ENABLE is redefined as BIT(0) so the shift is moved into the
definition, rather than its use.
Link: https://lkml.kernel.org/r/20211115174250.1994179-3-nathan@kernel.org
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Acked-by: Brian Cain <bcain(a)codeaurora.org>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/hexagon/include/asm/timer-regs.h | 26 ------------------------
arch/hexagon/include/asm/timex.h | 3 --
arch/hexagon/kernel/time.c | 12 +++++++++--
3 files changed, 11 insertions(+), 30 deletions(-)
--- a/arch/hexagon/include/asm/timer-regs.h
+++ /dev/null
@@ -1,26 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Timer support for Hexagon
- *
- * Copyright (c) 2010-2011, The Linux Foundation. All rights reserved.
- */
-
-#ifndef _ASM_TIMER_REGS_H
-#define _ASM_TIMER_REGS_H
-
-/* This stuff should go into a platform specific file */
-#define TCX0_CLK_RATE 19200
-#define TIMER_ENABLE 0
-#define TIMER_CLR_ON_MATCH 1
-
-/*
- * 8x50 HDD Specs 5-8. Simulator co-sim not fixed until
- * release 1.1, and then it's "adjustable" and probably not defaulted.
- */
-#define RTOS_TIMER_INT 3
-#ifdef CONFIG_HEXAGON_COMET
-#define RTOS_TIMER_REGS_ADDR 0xAB000000UL
-#endif
-#define SLEEP_CLK_RATE 32000
-
-#endif
--- a/arch/hexagon/include/asm/timex.h~hexagon-clean-up-timer-regsh
+++ a/arch/hexagon/include/asm/timex.h
@@ -7,11 +7,10 @@
#define _ASM_TIMEX_H
#include <asm-generic/timex.h>
-#include <asm/timer-regs.h>
#include <asm/hexagon_vm.h>
/* Using TCX0 as our clock. CLOCK_TICK_RATE scheduled to be removed. */
-#define CLOCK_TICK_RATE TCX0_CLK_RATE
+#define CLOCK_TICK_RATE 19200
#define ARCH_HAS_READ_CURRENT_TIMER
--- a/arch/hexagon/kernel/time.c~hexagon-clean-up-timer-regsh
+++ a/arch/hexagon/kernel/time.c
@@ -17,9 +17,10 @@
#include <linux/of_irq.h>
#include <linux/module.h>
-#include <asm/timer-regs.h>
#include <asm/hexagon_vm.h>
+#define TIMER_ENABLE BIT(0)
+
/*
* For the clocksource we need:
* pcycle frequency (600MHz)
@@ -33,6 +34,13 @@ cycles_t pcycle_freq_mhz;
cycles_t thread_freq_mhz;
cycles_t sleep_clk_freq;
+/*
+ * 8x50 HDD Specs 5-8. Simulator co-sim not fixed until
+ * release 1.1, and then it's "adjustable" and probably not defaulted.
+ */
+#define RTOS_TIMER_INT 3
+#define RTOS_TIMER_REGS_ADDR 0xAB000000UL
+
static struct resource rtos_timer_resources[] = {
{
.start = RTOS_TIMER_REGS_ADDR,
@@ -80,7 +88,7 @@ static int set_next_event(unsigned long
iowrite32(0, &rtos_timer->clear);
iowrite32(delta, &rtos_timer->match);
- iowrite32(1 << TIMER_ENABLE, &rtos_timer->enable);
+ iowrite32(TIMER_ENABLE, &rtos_timer->enable);
return 0;
}
_
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Subject: shm: extend forced shm destroy to support objects from several IPC nses
Currently, exit_shm function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).
That particular patch is attempt to fix the problem by extending exit_shm
mechanism to handle shm's destroy from several IPC ns'es.
To achieve that we do several things:
1. add namespace (non-refcounted) pointer to the struct shmid_kernel
2. during new shm object creation (newseg()/shmget syscall) we
initialize this pointer by current task IPC ns
3. exit_shm() fully reworked such that it traverses over all shp's in
task->sysvshm.shm_clist and gets IPC namespace not from current task as
it was before but from shp's object itself, then call shm_destroy(shp,
ns).
Note. We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we using
special helper get_ipc_ns_not_zero() which allows to get IPC ns refcounter
only if IPC ns not in the "state of destruction".
Q/A
Q: Why we can access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther
than IPC namespace lifetime, so, if we get shp object from the
task->sysvshm.shm_clist while holding task_lock(task) nobody can
steal our namespace.
Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: Not. It's just fixes non-covered case when process may leave
IPC namespace without getting task->sysvshm.shm_clist list cleaned up.
Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife…
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Vasily Averin <vvs(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/ipc_namespace.h | 15 ++
include/linux/sched/task.h | 2
ipc/shm.c | 189 ++++++++++++++++++++++++--------
3 files changed, 159 insertions(+), 47 deletions(-)
--- a/include/linux/ipc_namespace.h~shm-extend-forced-shm-destroy-to-support-objects-from-several-ipc-nses-simplified
+++ a/include/linux/ipc_namespace.h
@@ -131,6 +131,16 @@ static inline struct ipc_namespace *get_
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ if (ns) {
+ if (refcount_inc_not_zero(&ns->ns.count))
+ return ns;
+ }
+
+ return NULL;
+}
+
extern void put_ipc_ns(struct ipc_namespace *ns);
#else
static inline struct ipc_namespace *copy_ipcs(unsigned long flags,
@@ -146,6 +156,11 @@ static inline struct ipc_namespace *get_
{
return ns;
}
+
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ return ns;
+}
static inline void put_ipc_ns(struct ipc_namespace *ns)
{
--- a/include/linux/sched/task.h~shm-extend-forced-shm-destroy-to-support-objects-from-several-ipc-nses-simplified
+++ a/include/linux/sched/task.h
@@ -158,7 +158,7 @@ static inline struct vm_struct *task_sta
* Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
* subscriptions and synchronises with wait4(). Also used in procfs. Also
* pins the final release of task.io_context. Also protects ->cpuset and
- * ->cgroup.subsys[]. And ->vfork_done.
+ * ->cgroup.subsys[]. And ->vfork_done. And ->sysvshm.shm_clist.
*
* Nests both inside and outside of read_lock(&tasklist_lock).
* It must not be nested with write_lock_irq(&tasklist_lock),
--- a/ipc/shm.c~shm-extend-forced-shm-destroy-to-support-objects-from-several-ipc-nses-simplified
+++ a/ipc/shm.c
@@ -62,9 +62,18 @@ struct shmid_kernel /* private to the ke
struct pid *shm_lprid;
struct ucounts *mlock_ucounts;
- /* The task created the shm object. NULL if the task is dead. */
+ /*
+ * The task created the shm object, for
+ * task_lock(shp->shm_creator)
+ */
struct task_struct *shm_creator;
- struct list_head shm_clist; /* list by creator */
+
+ /*
+ * List by creator. task_lock(->shm_creator) required for read/write.
+ * If list_empty(), then the creator is dead already.
+ */
+ struct list_head shm_clist;
+ struct ipc_namespace *ns;
} __randomize_layout;
/* shm_mode upper byte flags */
@@ -115,6 +124,7 @@ static void do_shm_rmid(struct ipc_names
struct shmid_kernel *shp;
shp = container_of(ipcp, struct shmid_kernel, shm_perm);
+ WARN_ON(ns != shp->ns);
if (shp->shm_nattch) {
shp->shm_perm.mode |= SHM_DEST;
@@ -225,10 +235,43 @@ static void shm_rcu_free(struct rcu_head
kfree(shp);
}
-static inline void shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *s)
+/*
+ * It has to be called with shp locked.
+ * It must be called before ipc_rmid()
+ */
+static inline void shm_clist_rm(struct shmid_kernel *shp)
+{
+ struct task_struct *creator;
+
+ /* ensure that shm_creator does not disappear */
+ rcu_read_lock();
+
+ /*
+ * A concurrent exit_shm may do a list_del_init() as well.
+ * Just do nothing if exit_shm already did the work
+ */
+ if (!list_empty(&shp->shm_clist)) {
+ /*
+ * shp->shm_creator is guaranteed to be valid *only*
+ * if shp->shm_clist is not empty.
+ */
+ creator = shp->shm_creator;
+
+ task_lock(creator);
+ /*
+ * list_del_init() is a nop if the entry was already removed
+ * from the list.
+ */
+ list_del_init(&shp->shm_clist);
+ task_unlock(creator);
+ }
+ rcu_read_unlock();
+}
+
+static inline void shm_rmid(struct shmid_kernel *s)
{
- list_del(&s->shm_clist);
- ipc_rmid(&shm_ids(ns), &s->shm_perm);
+ shm_clist_rm(s);
+ ipc_rmid(&shm_ids(s->ns), &s->shm_perm);
}
@@ -283,7 +326,7 @@ static void shm_destroy(struct ipc_names
shm_file = shp->shm_file;
shp->shm_file = NULL;
ns->shm_tot -= (shp->shm_segsz + PAGE_SIZE - 1) >> PAGE_SHIFT;
- shm_rmid(ns, shp);
+ shm_rmid(shp);
shm_unlock(shp);
if (!is_file_hugepages(shm_file))
shmem_lock(shm_file, 0, shp->mlock_ucounts);
@@ -303,10 +346,10 @@ static void shm_destroy(struct ipc_names
*
* 2) sysctl kernel.shm_rmid_forced is set to 1.
*/
-static bool shm_may_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
+static bool shm_may_destroy(struct shmid_kernel *shp)
{
return (shp->shm_nattch == 0) &&
- (ns->shm_rmid_forced ||
+ (shp->ns->shm_rmid_forced ||
(shp->shm_perm.mode & SHM_DEST));
}
@@ -337,7 +380,7 @@ static void shm_close(struct vm_area_str
ipc_update_pid(&shp->shm_lprid, task_tgid(current));
shp->shm_dtim = ktime_get_real_seconds();
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
@@ -358,10 +401,10 @@ static int shm_try_destroy_orphaned(int
*
* As shp->* are changed under rwsem, it's safe to skip shp locking.
*/
- if (shp->shm_creator != NULL)
+ if (!list_empty(&shp->shm_clist))
return 0;
- if (shm_may_destroy(ns, shp)) {
+ if (shm_may_destroy(shp)) {
shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
}
@@ -379,48 +422,97 @@ void shm_destroy_orphaned(struct ipc_nam
/* Locking assumes this will only be called with task == current */
void exit_shm(struct task_struct *task)
{
- struct ipc_namespace *ns = task->nsproxy->ipc_ns;
- struct shmid_kernel *shp, *n;
+ for (;;) {
+ struct shmid_kernel *shp;
+ struct ipc_namespace *ns;
- if (list_empty(&task->sysvshm.shm_clist))
- return;
+ task_lock(task);
+
+ if (list_empty(&task->sysvshm.shm_clist)) {
+ task_unlock(task);
+ break;
+ }
+
+ shp = list_first_entry(&task->sysvshm.shm_clist, struct shmid_kernel,
+ shm_clist);
- /*
- * If kernel.shm_rmid_forced is not set then only keep track of
- * which shmids are orphaned, so that a later set of the sysctl
- * can clean them up.
- */
- if (!ns->shm_rmid_forced) {
- down_read(&shm_ids(ns).rwsem);
- list_for_each_entry(shp, &task->sysvshm.shm_clist, shm_clist)
- shp->shm_creator = NULL;
/*
- * Only under read lock but we are only called on current
- * so no entry on the list will be shared.
+ * 1) Get pointer to the ipc namespace. It is worth to say
+ * that this pointer is guaranteed to be valid because
+ * shp lifetime is always shorter than namespace lifetime
+ * in which shp lives.
+ * We taken task_lock it means that shp won't be freed.
*/
- list_del(&task->sysvshm.shm_clist);
- up_read(&shm_ids(ns).rwsem);
- return;
- }
+ ns = shp->ns;
- /*
- * Destroy all already created segments, that were not yet mapped,
- * and mark any mapped as orphan to cover the sysctl toggling.
- * Destroy is skipped if shm_may_destroy() returns false.
- */
- down_write(&shm_ids(ns).rwsem);
- list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) {
- shp->shm_creator = NULL;
+ /*
+ * 2) If kernel.shm_rmid_forced is not set then only keep track of
+ * which shmids are orphaned, so that a later set of the sysctl
+ * can clean them up.
+ */
+ if (!ns->shm_rmid_forced)
+ goto unlink_continue;
- if (shm_may_destroy(ns, shp)) {
- shm_lock_by_ptr(shp);
- shm_destroy(ns, shp);
+ /*
+ * 3) get a reference to the namespace.
+ * The refcount could be already 0. If it is 0, then
+ * the shm objects will be free by free_ipc_work().
+ */
+ ns = get_ipc_ns_not_zero(ns);
+ if (!ns) {
+unlink_continue:
+ list_del_init(&shp->shm_clist);
+ task_unlock(task);
+ continue;
}
- }
- /* Remove the list head from any segments still attached. */
- list_del(&task->sysvshm.shm_clist);
- up_write(&shm_ids(ns).rwsem);
+ /*
+ * 4) get a reference to shp.
+ * This cannot fail: shm_clist_rm() is called before
+ * ipc_rmid(), thus the refcount cannot be 0.
+ */
+ WARN_ON(!ipc_rcu_getref(&shp->shm_perm));
+
+ /*
+ * 5) unlink the shm segment from the list of segments
+ * created by current.
+ * This must be done last. After unlinking,
+ * only the refcounts obtained above prevent IPC_RMID
+ * from destroying the segment or the namespace.
+ */
+ list_del_init(&shp->shm_clist);
+
+ task_unlock(task);
+
+ /*
+ * 6) we have all references
+ * Thus lock & if needed destroy shp.
+ */
+ down_write(&shm_ids(ns).rwsem);
+ shm_lock_by_ptr(shp);
+ /*
+ * rcu_read_lock was implicitly taken in shm_lock_by_ptr, it's
+ * safe to call ipc_rcu_putref here
+ */
+ ipc_rcu_putref(&shp->shm_perm, shm_rcu_free);
+
+ if (ipc_valid_object(&shp->shm_perm)) {
+ if (shm_may_destroy(shp))
+ shm_destroy(ns, shp);
+ else
+ shm_unlock(shp);
+ } else {
+ /*
+ * Someone else deleted the shp from namespace
+ * idr/kht while we have waited.
+ * Just unlock and continue.
+ */
+ shm_unlock(shp);
+ }
+
+ up_write(&shm_ids(ns).rwsem);
+ put_ipc_ns(ns); /* paired with get_ipc_ns_not_zero */
+ }
}
static vm_fault_t shm_fault(struct vm_fault *vmf)
@@ -676,7 +768,11 @@ static int newseg(struct ipc_namespace *
if (error < 0)
goto no_id;
+ shp->ns = ns;
+
+ task_lock(current);
list_add(&shp->shm_clist, ¤t->sysvshm.shm_clist);
+ task_unlock(current);
/*
* shmid gets reported as "inode#" in /proc/pid/maps.
@@ -1567,7 +1663,8 @@ out_nattch:
down_write(&shm_ids(ns).rwsem);
shp = shm_lock(ns, shmid);
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
_
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Subject: ipc: WARN if trying to remove ipc object which is absent
Patch series "shm: shm_rmid_forced feature fixes".
Some time ago I met kernel crash after CRIU restore procedure,
fortunately, it was CRIU restore, so, I had dump files and could do
restore many times and crash reproduced easily. After some investigation
I've constructed the minimal reproducer. It was found that it's
use-after-free and it happens only if sysctl kernel.shm_rmid_forced = 1.
The key of the problem is that the exit_shm() function not handles shp's
object destroy when task->sysvshm.shm_clist contains items from different
IPC namespaces. In most cases this list will contain only items from one
IPC namespace.
Why this list may contain object from different namespaces? Function
exit_shm() designed to clean up this list always when process leaves IPC
namespace. But we made a mistake a long time ago and not add exit_shm()
call into setns() syscall procedures. 1st second idea was just to add
this call to setns() syscall but it's obviously changes semantics of
setns() syscall and that's userspace-visible change. So, I gave up this
idea.
First real attempt to address the issue was just to omit forced destroy if
we meet shp object not from current task IPC namespace [1]. But that was
not the best idea because task->sysvshm.shm_clist was protected by rwsem
which belongs to current task IPC namespace. It means that list
corruption may occur.
Second approach is just extend exit_shm() to properly handle shp's from
different IPC namespaces [2]. This is really non-trivial thing, I've put
a lot of effort into that but not believed that it's possible to make it
fully safe, clean and clear.
Thanks to the efforts of Manfred Spraul working an elegant solution was
designed. Thanks a lot, Manfred!
Eric also suggested the way to address the issue in ("[RFC][PATCH] shm: In
shm_exit destroy all created and never attached segments") Eric's idea was
to maintain a list of shm_clists one per IPC namespace, use lock-less
lists. But there is some extra memory consumption-related concerns.
Alternative solution which was suggested by me was implemented in ("shm:
reset shm_clist on setns but omit forced shm destroy") Idea is pretty
simple, we add exit_shm() syscall to setns() but DO NOT destroy shm
segments even if sysctl kernel.shm_rmid_forced = 1, we just clean up the
task->sysvshm.shm_clist list. This chages semantics of setns() syscall a
little bit but in comparision to "naive" solution when we just add
exit_shm() without any special exclusions this looks like a safer option.
[1] https://lkml.org/lkml/2021/7/6/1108
[2] https://lkml.org/lkml/2021/7/14/736
This patch (of 2):
Let's produce a warning if we trying to remove non-existing IPC object
from IPC namespace kht/idr structures.
This allows to catch possible bugs when ipc_rmid() function was called
with inconsistent struct ipc_ids*, struct kern_ipc_perm* arguments.
Link: https://lkml.kernel.org/r/20211027224348.611025-1-alexander.mikhalitsyn@vir…
Link: https://lkml.kernel.org/r/20211027224348.611025-2-alexander.mikhalitsyn@vir…
Co-developed-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Vasily Averin <vvs(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
ipc/util.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/ipc/util.c~ipc-warn-if-trying-to-remove-ipc-object-which-is-absent
+++ a/ipc/util.c
@@ -447,8 +447,8 @@ static int ipcget_public(struct ipc_name
static void ipc_kht_remove(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
{
if (ipcp->key != IPC_PRIVATE)
- rhashtable_remove_fast(&ids->key_ht, &ipcp->khtnode,
- ipc_kht_params);
+ WARN_ON_ONCE(rhashtable_remove_fast(&ids->key_ht, &ipcp->khtnode,
+ ipc_kht_params));
}
/**
@@ -498,7 +498,7 @@ void ipc_rmid(struct ipc_ids *ids, struc
{
int idx = ipcid_to_idx(ipcp->id);
- idr_remove(&ids->ipcs_idr, idx);
+ WARN_ON_ONCE(idr_remove(&ids->ipcs_idr, idx) != ipcp);
ipc_kht_remove(ids, ipcp);
ids->in_use--;
ipcp->deleted = true;
_
Reading from the CMOS involves writing to the index register and then
reading from the data register. Therefore access to the CMOS has to be
serialized with rtc_lock. This invocation of CMOS_READ was not
serialized, which could cause trouble when other code is accessing CMOS
at the same time.
Use spin_lock_irq() like the rest of the function.
Nothing in kernel modifies the RTC_DM_BINARY bit, so there could be a
separate pair of spin_lock_irq() / spin_unlock_irq() before doing the
math.
Signed-off-by: Mateusz Jończyk <mat.jonczyk(a)o2.pl>
Reviewed-by: Nobuhiro Iwamatsu <iwamatsu(a)nigauri.org>
Cc: Alessandro Zummo <a.zummo(a)towertech.it>
Cc: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
Cc: stable(a)vger.kernel.org
---
drivers/rtc/rtc-cmos.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index 4eb53412b808..dc3f8b0dde98 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -457,7 +457,10 @@ static int cmos_set_alarm(struct device *dev, struct rtc_wkalrm *t)
min = t->time.tm_min;
sec = t->time.tm_sec;
+ spin_lock_irq(&rtc_lock);
rtc_control = CMOS_READ(RTC_CONTROL);
+ spin_unlock_irq(&rtc_lock);
+
if (!(rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) {
/* Writing 0xff means "don't care" or "match all". */
mon = (mon <= 12) ? bin2bcd(mon) : 0xff;
--
2.25.1
The function nand_davinci_read_page_hwecc_oob_first() does read the ECC
data from the OOB area. Therefore it does not need to calculate the ECC
as it is already available.
Cc: <stable(a)vger.kernel.org> # v5.2
Fixes: a0ac778eb82c ("mtd: rawnand: ingenic: Add support for the JZ4740")
Signed-off-by: Paul Cercueil <paul(a)crapouillou.net>
---
drivers/mtd/nand/raw/davinci_nand.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/mtd/nand/raw/davinci_nand.c b/drivers/mtd/nand/raw/davinci_nand.c
index 118da9944e3b..89de24d3bb7a 100644
--- a/drivers/mtd/nand/raw/davinci_nand.c
+++ b/drivers/mtd/nand/raw/davinci_nand.c
@@ -394,7 +394,6 @@ static int nand_davinci_read_page_hwecc_oob_first(struct nand_chip *chip,
int eccsteps = chip->ecc.steps;
uint8_t *p = buf;
uint8_t *ecc_code = chip->ecc.code_buf;
- uint8_t *ecc_calc = chip->ecc.calc_buf;
unsigned int max_bitflips = 0;
/* Read the OOB area first */
@@ -420,8 +419,6 @@ static int nand_davinci_read_page_hwecc_oob_first(struct nand_chip *chip,
if (ret)
return ret;
- chip->ecc.calculate(chip, p, &ecc_calc[i]);
-
stat = chip->ecc.correct(chip, p, &ecc_code[i], NULL);
if (stat == -EBADMSG &&
(chip->ecc.options & NAND_ECC_GENERIC_ERASED_CHECK)) {
--
2.33.0
From: Stefan Riedmueller <s.riedmueller(a)phytec.de>
There is no need to explicitly set the default gpmi clock rate during
boot for the i.MX 6 since this is done during nand_detect anyway.
Signed-off-by: Stefan Riedmueller <s.riedmueller(a)phytec.de>
Cc: stable(a)vger.kernel.org
---
@stable: This patch fixes a bug because this (superfluous) call to
clk_set_rate() misses the required clock gating. The resulting
clock glitches can prevent the system from booting.
Changelog:
-----------
RFC --> v1
- Cc: stable(a)vger.kernel.org
drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
index 4d08e4ab5c1b..a1f7000f033e 100644
--- a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
@@ -1034,15 +1034,6 @@ static int gpmi_get_clks(struct gpmi_nand_data *this)
r->clock[i] = clk;
}
- if (GPMI_IS_MX6(this))
- /*
- * Set the default value for the gpmi clock.
- *
- * If you want to use the ONFI nand which is in the
- * Synchronous Mode, you should change the clock as you need.
- */
- clk_set_rate(r->clock[0], 22000000);
-
return 0;
err_clock:
--
Christian Eggers
Embedded software developer
Arnold & Richter Cine Technik GmbH & Co. Betriebs KG
Sitz: Muenchen - Registergericht: Amtsgericht Muenchen - Handelsregisternummer: HRA 57918
Persoenlich haftender Gesellschafter: Arnold & Richter Cine Technik GmbH
Sitz: Muenchen - Registergericht: Amtsgericht Muenchen - Handelsregisternummer: HRB 54477
Geschaeftsfuehrer: Dr. Michael Neuhaeuser; Stephan Schenk; Walter Trauninger; Markus Zeiler
Not the child partition should be removed from the partition list
but the partition itself. Otherwise the partition list gets broken
and any subsequent remove operations leads to a kernel panic.
Fixes: 46b5889cc2c5 ("mtd: implement proper partition handling")
Signed-off-by: Andreas Oetken <andreas.oetken(a)siemens-energy.com>
Cc: stable(a)vger.kernel.org
---
drivers/mtd/mtdpart.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/mtdpart.c b/drivers/mtd/mtdpart.c
index 95d47422bbf20..5725818fa199f 100644
--- a/drivers/mtd/mtdpart.c
+++ b/drivers/mtd/mtdpart.c
@@ -313,7 +313,7 @@ static int __mtd_del_partition(struct mtd_info *mtd)
if (err)
return err;
- list_del(&child->part.node);
+ list_del(&mtd->part.node);
free_partition(mtd);
return 0;
--
2.30.2
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: dib0700: fix undefined behavior in tuner shutdown
Author: Michael Kuron <michael.kuron(a)gmail.com>
Date: Sun Sep 26 21:51:26 2021 +0100
This fixes a problem where closing the tuner would leave it in a state
where it would not tune to any channel when reopened. This problem was
discovered as part of https://github.com/hselasky/webcamd/issues/16.
Since adap->id is 0 or 1, this bit-shift overflows, which is undefined
behavior. The driver still worked in practice as the overflow would in
most environments result in 0, which rendered the line a no-op. When
running the driver as part of webcamd however, the overflow could lead
to 0xff due to optimizations by the compiler, which would, in the end,
improperly shut down the tuner.
The bug is a regression introduced in the commit referenced below. The
present patch causes identical behavior to before that commit for
adap->id equal to 0 or 1. The driver does not contain support for
dib0700 devices with more adapters, assuming such even exist.
Tests have been performed with the Xbox One Digital TV Tuner on amd64.
Not all dib0700 devices are expected to be affected by the regression;
this code path is only taken by those with incorrect endpoint numbers.
Link: https://lore.kernel.org/linux-media/1d2fc36d94ced6f67c7cc21dcc469d5e5bdd820…
Cc: stable(a)vger.kernel.org
Fixes: 7757ddda6f4f ("[media] DiB0700: add function to change I2C-speed")
Signed-off-by: Michael Kuron <michael.kuron(a)gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/usb/dvb-usb/dib0700_core.c | 2 --
1 file changed, 2 deletions(-)
---
diff --git a/drivers/media/usb/dvb-usb/dib0700_core.c b/drivers/media/usb/dvb-usb/dib0700_core.c
index 70219b3e8566..7ea8f68b0f45 100644
--- a/drivers/media/usb/dvb-usb/dib0700_core.c
+++ b/drivers/media/usb/dvb-usb/dib0700_core.c
@@ -618,8 +618,6 @@ int dib0700_streaming_ctrl(struct dvb_usb_adapter *adap, int onoff)
deb_info("the endpoint number (%i) is not correct, use the adapter id instead", adap->fe_adap[0].stream.props.endpoint);
if (onoff)
st->channel_state |= 1 << (adap->id);
- else
- st->channel_state |= 1 << ~(adap->id);
} else {
if (onoff)
st->channel_state |= 1 << (adap->fe_adap[0].stream.props.endpoint-2);
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: dib0700: fix undefined behavior in tuner shutdown
Author: Michael Kuron <michael.kuron(a)gmail.com>
Date: Sun Sep 26 21:51:26 2021 +0100
This fixes a problem where closing the tuner would leave it in a state
where it would not tune to any channel when reopened. This problem was
discovered as part of https://github.com/hselasky/webcamd/issues/16.
Since adap->id is 0 or 1, this bit-shift overflows, which is undefined
behavior. The driver still worked in practice as the overflow would in
most environments result in 0, which rendered the line a no-op. When
running the driver as part of webcamd however, the overflow could lead
to 0xff due to optimizations by the compiler, which would, in the end,
improperly shut down the tuner.
The bug is a regression introduced in the commit referenced below. The
present patch causes identical behavior to before that commit for
adap->id equal to 0 or 1. The driver does not contain support for
dib0700 devices with more adapters, assuming such even exist.
Tests have been performed with the Xbox One Digital TV Tuner on amd64.
Not all dib0700 devices are expected to be affected by the regression;
this code path is only taken by those with incorrect endpoint numbers.
Link: https://lore.kernel.org/linux-media/1d2fc36d94ced6f67c7cc21dcc469d5e5bdd820…
Cc: stable(a)vger.kernel.org
Fixes: 7757ddda6f4f ("[media] DiB0700: add function to change I2C-speed")
Signed-off-by: Michael Kuron <michael.kuron(a)gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/usb/dvb-usb/dib0700_core.c | 2 --
1 file changed, 2 deletions(-)
---
diff --git a/drivers/media/usb/dvb-usb/dib0700_core.c b/drivers/media/usb/dvb-usb/dib0700_core.c
index 70219b3e8566..7ea8f68b0f45 100644
--- a/drivers/media/usb/dvb-usb/dib0700_core.c
+++ b/drivers/media/usb/dvb-usb/dib0700_core.c
@@ -618,8 +618,6 @@ int dib0700_streaming_ctrl(struct dvb_usb_adapter *adap, int onoff)
deb_info("the endpoint number (%i) is not correct, use the adapter id instead", adap->fe_adap[0].stream.props.endpoint);
if (onoff)
st->channel_state |= 1 << (adap->id);
- else
- st->channel_state |= 1 << ~(adap->id);
} else {
if (onoff)
st->channel_state |= 1 << (adap->fe_adap[0].stream.props.endpoint-2);
From: Pavel Skripkin <paskripkin(a)gmail.com>
commit 6f68cd634856f8ca93bafd623ba5357e0f648c68 upstream.
Syzbot reported ODEBUG warning in batadv_nc_mesh_free(). The problem was
in wrong error handling in batadv_mesh_init().
Before this patch batadv_mesh_init() was calling batadv_mesh_free() in case
of any batadv_*_init() calls failure. This approach may work well, when
there is some kind of indicator, which can tell which parts of batadv are
initialized; but there isn't any.
All written above lead to cleaning up uninitialized fields. Even if we hide
ODEBUG warning by initializing bat_priv->nc.work, syzbot was able to hit
GPF in batadv_nc_purge_paths(), because hash pointer in still NULL. [1]
To fix these bugs we can unwind batadv_*_init() calls one by one.
It is good approach for 2 reasons: 1) It fixes bugs on error handling
path 2) It improves the performance, since we won't call unneeded
batadv_*_free() functions.
So, this patch makes all batadv_*_init() clean up all allocated memory
before returning with an error to no call correspoing batadv_*_free()
and open-codes batadv_mesh_free() with proper order to avoid touching
uninitialized fields.
Link: https://lore.kernel.org/netdev/000000000000c87fbd05cef6bcb0@google.com/ [1]
Reported-and-tested-by: syzbot+28b0702ada0bf7381f58(a)syzkaller.appspotmail.com
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Signed-off-by: Pavel Skripkin <paskripkin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
[ bp: 4.4 backport: Drop batadv_v_mesh_{init,free} which are not there yet. ]
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
---
Submission according to the request in
https://lore.kernel.org/all/163559888490194@kroah.com/
net/batman-adv/bridge_loop_avoidance.c | 8 +++--
net/batman-adv/main.c | 44 +++++++++++++++++++-------
net/batman-adv/network-coding.c | 4 ++-
net/batman-adv/translation-table.c | 4 ++-
4 files changed, 44 insertions(+), 16 deletions(-)
diff --git a/net/batman-adv/bridge_loop_avoidance.c b/net/batman-adv/bridge_loop_avoidance.c
index 1267cbb1a329..5e59a6ecae42 100644
--- a/net/batman-adv/bridge_loop_avoidance.c
+++ b/net/batman-adv/bridge_loop_avoidance.c
@@ -1346,10 +1346,14 @@ int batadv_bla_init(struct batadv_priv *bat_priv)
return 0;
bat_priv->bla.claim_hash = batadv_hash_new(128);
- bat_priv->bla.backbone_hash = batadv_hash_new(32);
+ if (!bat_priv->bla.claim_hash)
+ return -ENOMEM;
- if (!bat_priv->bla.claim_hash || !bat_priv->bla.backbone_hash)
+ bat_priv->bla.backbone_hash = batadv_hash_new(32);
+ if (!bat_priv->bla.backbone_hash) {
+ batadv_hash_destroy(bat_priv->bla.claim_hash);
return -ENOMEM;
+ }
batadv_hash_set_lock_class(bat_priv->bla.claim_hash,
&batadv_claim_hash_lock_class_key);
diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index 88cea5154113..8ba7b86579d4 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -159,24 +159,34 @@ int batadv_mesh_init(struct net_device *soft_iface)
INIT_HLIST_HEAD(&bat_priv->softif_vlan_list);
ret = batadv_originator_init(bat_priv);
- if (ret < 0)
- goto err;
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_orig;
+ }
ret = batadv_tt_init(bat_priv);
- if (ret < 0)
- goto err;
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_tt;
+ }
ret = batadv_bla_init(bat_priv);
- if (ret < 0)
- goto err;
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_bla;
+ }
ret = batadv_dat_init(bat_priv);
- if (ret < 0)
- goto err;
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_dat;
+ }
ret = batadv_nc_mesh_init(bat_priv);
- if (ret < 0)
- goto err;
+ if (ret < 0) {
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_DEACTIVATING);
+ goto err_nc;
+ }
batadv_gw_init(bat_priv);
batadv_mcast_init(bat_priv);
@@ -186,8 +196,18 @@ int batadv_mesh_init(struct net_device *soft_iface)
return 0;
-err:
- batadv_mesh_free(soft_iface);
+err_nc:
+ batadv_dat_free(bat_priv);
+err_dat:
+ batadv_bla_free(bat_priv);
+err_bla:
+ batadv_tt_free(bat_priv);
+err_tt:
+ batadv_originator_free(bat_priv);
+err_orig:
+ batadv_purge_outstanding_packets(bat_priv, NULL);
+ atomic_set(&bat_priv->mesh_state, BATADV_MESH_INACTIVE);
+
return ret;
}
diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c
index 91de807a8f03..9317d872b9c0 100644
--- a/net/batman-adv/network-coding.c
+++ b/net/batman-adv/network-coding.c
@@ -159,8 +159,10 @@ int batadv_nc_mesh_init(struct batadv_priv *bat_priv)
&batadv_nc_coding_hash_lock_class_key);
bat_priv->nc.decoding_hash = batadv_hash_new(128);
- if (!bat_priv->nc.decoding_hash)
+ if (!bat_priv->nc.decoding_hash) {
+ batadv_hash_destroy(bat_priv->nc.coding_hash);
goto err;
+ }
batadv_hash_set_lock_class(bat_priv->nc.decoding_hash,
&batadv_nc_decoding_hash_lock_class_key);
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index 5f976485e8c6..1ad90267064d 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -3833,8 +3833,10 @@ int batadv_tt_init(struct batadv_priv *bat_priv)
return ret;
ret = batadv_tt_global_init(bat_priv);
- if (ret < 0)
+ if (ret < 0) {
+ batadv_tt_local_table_free(bat_priv);
return ret;
+ }
batadv_tvlv_handler_register(bat_priv, batadv_tt_tvlv_ogm_handler_v1,
batadv_tt_tvlv_unicast_handler_v1,
--
2.30.2
This patch is for linux-5.10.y only.
When scripts/lld-version.sh was initially written, it did not account
for the LLD_VENDOR cmake flag, which changes the output of ld.lld's
--version flag slightly.
Without LLD_VENDOR:
$ ld.lld --version
LLD 14.0.0 (compatible with GNU linkers)
With LLD_VENDOR:
$ ld.lld --version
Debian LLD 14.0.0 (compatible with GNU linkers)
As a result, CONFIG_LLD_VERSION is messed up and configuration values
that are dependent on it cannot be selected:
scripts/lld-version.sh: 20: printf: LLD: expected numeric value
scripts/lld-version.sh: 20: printf: LLD: expected numeric value
scripts/lld-version.sh: 20: printf: LLD: expected numeric value
init/Kconfig:52:warning: 'LLD_VERSION': number is invalid
.config:11:warning: symbol value '00000' invalid for LLD_VERSION
.config:8800:warning: override: CPU_BIG_ENDIAN changes choice state
This was fixed upstream by commit 1f09af062556 ("kbuild: Fix
ld-version.sh script if LLD was built with LLD_VENDOR") in 5.12 but that
was done to ld-version.sh after it was massively rewritten in
commit 02aff8592204 ("kbuild: check the minimum linker version in
Kconfig").
To avoid bringing in that change plus its prerequisites and fixes, just
modify lld-version.sh to make it similar to the upstream ld-version.sh,
which handles ld.lld with or without LLD_VENDOR and ld.bfd without any
errors.
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
Our CI caught this error with newer versions of Debian's ld.lld, which
added LLD_VENDOR it seems:
https://github.com/ClangBuiltLinux/continuous-integration2/runs/4206343929?…
A similar change was done by me for Android, where it has seen no
issues:
https://android-review.googlesource.com/c/kernel/common/+/1744324
I believe this is a safer change than backporting the fixes from
upstream but if you guys feel otherwise, I can do so. This is only
needed in 5.10 as CONFIG_LLD_VERSION does not exist in 5.4 and it was
fixed in 5.12 upstream.
scripts/lld-version.sh | 35 ++++++++++++++++++++++++++---------
1 file changed, 26 insertions(+), 9 deletions(-)
diff --git a/scripts/lld-version.sh b/scripts/lld-version.sh
index d70edb4d8a4f..f1eeee450a23 100755
--- a/scripts/lld-version.sh
+++ b/scripts/lld-version.sh
@@ -6,15 +6,32 @@
# Print the linker version of `ld.lld' in a 5 or 6-digit form
# such as `100001' for ld.lld 10.0.1 etc.
-linker_string="$($* --version)"
+set -e
-if ! ( echo $linker_string | grep -q LLD ); then
+# Convert the version string x.y.z to a canonical 5 or 6-digit form.
+get_canonical_version()
+{
+ IFS=.
+ set -- $1
+
+ # If the 2nd or 3rd field is missing, fill it with a zero.
+ echo $((10000 * $1 + 100 * ${2:-0} + ${3:-0}))
+}
+
+# Get the first line of the --version output.
+IFS='
+'
+set -- $(LC_ALL=C "$@" --version)
+
+# Split the line on spaces.
+IFS=' '
+set -- $1
+
+while [ $# -gt 1 -a "$1" != "LLD" ]; do
+ shift
+done
+if [ "$1" = LLD ]; then
+ echo $(get_canonical_version ${2%-*})
+else
echo 0
- exit 1
fi
-
-VERSION=$(echo $linker_string | cut -d ' ' -f 2)
-MAJOR=$(echo $VERSION | cut -d . -f 1)
-MINOR=$(echo $VERSION | cut -d . -f 2)
-PATCHLEVEL=$(echo $VERSION | cut -d . -f 3)
-printf "%d%02d%02d\\n" $MAJOR $MINOR $PATCHLEVEL
base-commit: bd816c278316f20a5575debc64dde4422229a880
--
2.34.0.rc0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 86432a6dca9bed79111990851df5756d3eb5f57c Mon Sep 17 00:00:00 2001
From: Gao Xiang <hsiangkao(a)linux.alibaba.com>
Date: Thu, 4 Nov 2021 02:20:06 +0800
Subject: [PATCH] erofs: fix unsafe pagevec reuse of hooked pclusters
There are pclusters in runtime marked with Z_EROFS_PCLUSTER_TAIL
before actual I/O submission. Thus, the decompression chain can be
extended if the following pcluster chain hooks such tail pcluster.
As the related comment mentioned, if some page is made of a hooked
pcluster and another followed pcluster, it can be reused for in-place
I/O (since I/O should be submitted anyway):
_______________________________________________________________
| tail (partial) page | head (partial) page |
|_____PRIMARY_HOOKED___|____________PRIMARY_FOLLOWED____________|
However, it's by no means safe to reuse as pagevec since if such
PRIMARY_HOOKED pclusters finally move into bypass chain without I/O
submission. It's somewhat hard to reproduce with LZ4 and I just found
it (general protection fault) by ro_fsstressing a LZMA image for long
time.
I'm going to actively clean up related code together with multi-page
folio adaption in the next few months. Let's address it directly for
easier backporting for now.
Call trace for reference:
z_erofs_decompress_pcluster+0x10a/0x8a0 [erofs]
z_erofs_decompress_queue.isra.36+0x3c/0x60 [erofs]
z_erofs_runqueue+0x5f3/0x840 [erofs]
z_erofs_readahead+0x1e8/0x320 [erofs]
read_pages+0x91/0x270
page_cache_ra_unbounded+0x18b/0x240
filemap_get_pages+0x10a/0x5f0
filemap_read+0xa9/0x330
new_sync_read+0x11b/0x1a0
vfs_read+0xf1/0x190
Link: https://lore.kernel.org/r/20211103182006.4040-1-xiang@kernel.org
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable(a)vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Gao Xiang <hsiangkao(a)linux.alibaba.com>
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 11c7a1aaebad..eb51df4a9f77 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -373,8 +373,8 @@ static bool z_erofs_try_inplace_io(struct z_erofs_collector *clt,
/* callers must be with collection lock held */
static int z_erofs_attach_page(struct z_erofs_collector *clt,
- struct page *page,
- enum z_erofs_page_type type)
+ struct page *page, enum z_erofs_page_type type,
+ bool pvec_safereuse)
{
int ret;
@@ -384,9 +384,9 @@ static int z_erofs_attach_page(struct z_erofs_collector *clt,
z_erofs_try_inplace_io(clt, page))
return 0;
- ret = z_erofs_pagevec_enqueue(&clt->vector, page, type);
+ ret = z_erofs_pagevec_enqueue(&clt->vector, page, type,
+ pvec_safereuse);
clt->cl->vcnt += (unsigned int)ret;
-
return ret ? 0 : -EAGAIN;
}
@@ -729,7 +729,8 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
tight &= (clt->mode >= COLLECT_PRIMARY_FOLLOWED);
retry:
- err = z_erofs_attach_page(clt, page, page_type);
+ err = z_erofs_attach_page(clt, page, page_type,
+ clt->mode >= COLLECT_PRIMARY_FOLLOWED);
/* should allocate an additional short-lived page for pagevec */
if (err == -EAGAIN) {
struct page *const newpage =
@@ -737,7 +738,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
set_page_private(newpage, Z_EROFS_SHORTLIVED_PAGE);
err = z_erofs_attach_page(clt, newpage,
- Z_EROFS_PAGE_TYPE_EXCLUSIVE);
+ Z_EROFS_PAGE_TYPE_EXCLUSIVE, true);
if (!err)
goto retry;
}
diff --git a/fs/erofs/zpvec.h b/fs/erofs/zpvec.h
index dfd7fe0503bb..b05464f4a808 100644
--- a/fs/erofs/zpvec.h
+++ b/fs/erofs/zpvec.h
@@ -106,11 +106,18 @@ static inline void z_erofs_pagevec_ctor_init(struct z_erofs_pagevec_ctor *ctor,
static inline bool z_erofs_pagevec_enqueue(struct z_erofs_pagevec_ctor *ctor,
struct page *page,
- enum z_erofs_page_type type)
+ enum z_erofs_page_type type,
+ bool pvec_safereuse)
{
- if (!ctor->next && type)
- if (ctor->index + 1 == ctor->nr)
+ if (!ctor->next) {
+ /* some pages cannot be reused as pvec safely without I/O */
+ if (type == Z_EROFS_PAGE_TYPE_EXCLUSIVE && !pvec_safereuse)
+ type = Z_EROFS_VLE_PAGE_TYPE_TAIL_SHARED;
+
+ if (type != Z_EROFS_PAGE_TYPE_EXCLUSIVE &&
+ ctor->index + 1 == ctor->nr)
return false;
+ }
if (ctor->index >= ctor->nr)
z_erofs_pagevec_ctor_pagedown(ctor, false);
Hi, Thomas,
On Fri, 5 Nov 2021 at 13:14, Thomas Gleixner <tglx(a)linutronix.de> wrote:
>
> On Thu, Nov 04 2021 at 18:01, Marc Zyngier wrote:
> > Rui reported[1] that his Nvidia ION system stopped working with 5.15,
> > with the AHCI device failing to get any MSI. A rapid investigation
> > revealed that although the device doesn't advertise MSI masking, it
> > actually needs it. Quality hardware indeed.
> >
> > Anyway, the couple of patches below are an attempt at dealing with the
> > issue in a more or less generic way.
> >
> > [1] https://lore.kernel.org/r/CALjTZvbzYfBuLB+H=fj2J+9=DxjQ2Uqcy0if_PvmJ-nU-qEg…
> >
> > Marc Zyngier (2):
> > PCI: MSI: Deal with devices lying about their MSI mask capability
> > PCI: Add MSI masking quirk for Nvidia ION AHCI
> >
> > drivers/pci/msi.c | 3 +++
> > drivers/pci/quirks.c | 6 ++++++
> > include/linux/pci.h | 2 ++
> > 3 files changed, 11 insertions(+)
>
> Groan.
>
> Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Just a reminder, to make sure this doesn't fall through the cracks.
It's already in 5.16, but needs to be backported to 5.15. I'm not
seeing it in Greg's 5.15 stable queue yet.
Thanks,
Rui
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3735459037114d31e5acd9894fad9aed104231a0 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 9 Nov 2021 14:53:57 +0100
Subject: [PATCH] PCI/MSI: Destroy sysfs before freeing entries
free_msi_irqs() frees the MSI entries before destroying the sysfs entries
which are exposing them. Nothing prevents a concurrent free while a sysfs
file is read and accesses the possibly freed entry.
Move the sysfs release ahead of freeing the entries.
Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Bjorn Helgaas <helgaas(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87sfw5305m.ffs@tglx
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 70433013897b..48e3f4e47b29 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -368,6 +368,11 @@ static void free_msi_irqs(struct pci_dev *dev)
for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));
+ if (dev->msi_irq_groups) {
+ msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
+ dev->msi_irq_groups = NULL;
+ }
+
pci_msi_teardown_msi_irqs(dev);
list_for_each_entry_safe(entry, tmp, msi_list, list) {
@@ -379,11 +384,6 @@ static void free_msi_irqs(struct pci_dev *dev)
list_del(&entry->list);
free_msi_entry(entry);
}
-
- if (dev->msi_irq_groups) {
- msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
- dev->msi_irq_groups = NULL;
- }
}
static void pci_intx_for_msi(struct pci_dev *dev, int enable)
Please apply this patch to the stable kernels up to v5.15.
It's basically upstream commit 3ec18fc7831e7d79e2d536dd1f3bc0d3ba425e8a,
adjusted so that it applies to the stable kernels.
It requires that upstream commit 8779e05ba8aaffec1829872ef9774a71f44f6580
is applied before, which shouldn't be a problem as it was tagged for
stable series in the original commmit already.
Thanks,
Helge
--------
From: Sven Schnelle <svens(a)stackframe.org>
Date: Sat, 13 Nov 2021 20:41:17 +0100
Subject: [PATCH] parisc/entry: fix trace test in syscall exit path
Upstream commit: 3ec18fc7831e7d79e2d536dd1f3bc0d3ba425e8a
commit 8779e05ba8aa ("parisc: Fix ptrace check on syscall return")
fixed testing of TI_FLAGS. This uncovered a bug in the test mask.
syscall_restore_rfi is only used when the kernel needs to exit to
usespace with single or block stepping and the recovery counter
enabled. The test however used _TIF_SYSCALL_TRACE_MASK, which
includes a lot of bits that shouldn't be tested here.
Fix this by using TIF_SINGLESTEP and TIF_BLOCKSTEP directly.
I encountered this bug by enabling syscall tracepoints. Both in qemu and
on real hardware. As soon as i enabled the tracepoint (sys_exit_read,
but i guess it doesn't really matter which one), i got random page
faults in userspace almost immediately.
Signed-off-by: Sven Schnelle <svens(a)stackframe.org>
Signed-off-by: Helge Deller <deller(a)gmx.de>
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index 2716e58b498b..437c8d31f390 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -1835,7 +1835,7 @@ syscall_restore:
/* Are we being ptraced? */
LDREG TI_FLAGS-THREAD_SZ_ALGN-FRAME_SIZE(%r30),%r19
- ldi _TIF_SYSCALL_TRACE_MASK,%r2
+ ldi _TIF_SINGLESTEP|_TIF_BLOCKSTEP,%r2
and,COND(=) %r19,%r2,%r0
b,n syscall_restore_rfi
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4030a6e6a6a4a42ff8c18414c9e0c93e24cc70b8 Mon Sep 17 00:00:00 2001
From: Paul Burton <paulburton(a)google.com>
Date: Thu, 1 Jul 2021 10:24:07 -0700
Subject: [PATCH] tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT
Currently tgid_map is sized at PID_MAX_DEFAULT entries, which means that
on systems where pid_max is configured higher than PID_MAX_DEFAULT the
ftrace record-tgid option doesn't work so well. Any tasks with PIDs
higher than PID_MAX_DEFAULT are simply not recorded in tgid_map, and
don't show up in the saved_tgids file.
In particular since systemd v243 & above configure pid_max to its
highest possible 1<<22 value by default on 64 bit systems this renders
the record-tgids option of little use.
Increase the size of tgid_map to the configured pid_max instead,
allowing it to cover the full range of PIDs up to the maximum value of
PID_MAX_LIMIT if the system is configured that way.
On 64 bit systems with pid_max == PID_MAX_LIMIT this will increase the
size of tgid_map from 256KiB to 16MiB. Whilst this 64x increase in
memory overhead sounds significant 64 bit systems are presumably best
placed to accommodate it, and since tgid_map is only allocated when the
record-tgid option is actually used presumably the user would rather it
spends sufficient memory to actually record the tgids they expect.
The size of tgid_map could also increase for CONFIG_BASE_SMALL=y
configurations, but these seem unlikely to be systems upon which people
are both configuring a large pid_max and running ftrace with record-tgid
anyway.
Of note is that we only allocate tgid_map once, the first time that the
record-tgid option is enabled. Therefore its size is only set once, to
the value of pid_max at the time the record-tgid option is first
enabled. If a user increases pid_max after that point, the saved_tgids
file will not contain entries for any tasks with pids beyond the earlier
value of pid_max.
Link: https://lkml.kernel.org/r/20210701172407.889626-2-paulburton@google.com
Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks")
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Joel Fernandes <joelaf(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Paul Burton <paulburton(a)google.com>
[ Fixed comment coding style ]
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4843076d67d3..14f56e9fa001 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2191,8 +2191,15 @@ void tracing_reset_all_online_cpus(void)
}
}
+/*
+ * The tgid_map array maps from pid to tgid; i.e. the value stored at index i
+ * is the tgid last observed corresponding to pid=i.
+ */
static int *tgid_map;
+/* The maximum valid index into tgid_map. */
+static size_t tgid_map_max;
+
#define SAVED_CMDLINES_DEFAULT 128
#define NO_CMDLINE_MAP UINT_MAX
static arch_spinlock_t trace_cmdline_lock = __ARCH_SPIN_LOCK_UNLOCKED;
@@ -2468,24 +2475,41 @@ void trace_find_cmdline(int pid, char comm[])
preempt_enable();
}
+static int *trace_find_tgid_ptr(int pid)
+{
+ /*
+ * Pairs with the smp_store_release in set_tracer_flag() to ensure that
+ * if we observe a non-NULL tgid_map then we also observe the correct
+ * tgid_map_max.
+ */
+ int *map = smp_load_acquire(&tgid_map);
+
+ if (unlikely(!map || pid > tgid_map_max))
+ return NULL;
+
+ return &map[pid];
+}
+
int trace_find_tgid(int pid)
{
- if (unlikely(!tgid_map || !pid || pid > PID_MAX_DEFAULT))
- return 0;
+ int *ptr = trace_find_tgid_ptr(pid);
- return tgid_map[pid];
+ return ptr ? *ptr : 0;
}
static int trace_save_tgid(struct task_struct *tsk)
{
+ int *ptr;
+
/* treat recording of idle task as a success */
if (!tsk->pid)
return 1;
- if (unlikely(!tgid_map || tsk->pid > PID_MAX_DEFAULT))
+ ptr = trace_find_tgid_ptr(tsk->pid);
+ if (!ptr)
return 0;
- tgid_map[tsk->pid] = tsk->tgid;
+ *ptr = tsk->tgid;
return 1;
}
@@ -5225,6 +5249,8 @@ int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set)
int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
{
+ int *map;
+
if ((mask == TRACE_ITER_RECORD_TGID) ||
(mask == TRACE_ITER_RECORD_CMD))
lockdep_assert_held(&event_mutex);
@@ -5247,10 +5273,19 @@ int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
trace_event_enable_cmd_record(enabled);
if (mask == TRACE_ITER_RECORD_TGID) {
- if (!tgid_map)
- tgid_map = kvcalloc(PID_MAX_DEFAULT + 1,
- sizeof(*tgid_map),
- GFP_KERNEL);
+ if (!tgid_map) {
+ tgid_map_max = pid_max;
+ map = kvcalloc(tgid_map_max + 1, sizeof(*tgid_map),
+ GFP_KERNEL);
+
+ /*
+ * Pairs with smp_load_acquire() in
+ * trace_find_tgid_ptr() to ensure that if it observes
+ * the tgid_map we just allocated then it also observes
+ * the corresponding tgid_map_max value.
+ */
+ smp_store_release(&tgid_map, map);
+ }
if (!tgid_map) {
tr->trace_flags &= ~TRACE_ITER_RECORD_TGID;
return -ENOMEM;
@@ -5664,18 +5699,14 @@ static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos)
{
int pid = ++(*pos);
- if (pid > PID_MAX_DEFAULT)
- return NULL;
-
- return &tgid_map[pid];
+ return trace_find_tgid_ptr(pid);
}
static void *saved_tgids_start(struct seq_file *m, loff_t *pos)
{
- if (!tgid_map || *pos > PID_MAX_DEFAULT)
- return NULL;
+ int pid = *pos;
- return &tgid_map[*pos];
+ return trace_find_tgid_ptr(pid);
}
static void saved_tgids_stop(struct seq_file *m, void *v)
From: Meng Li <meng.li(a)windriver.com>
In stable kernel v5.10, when run below command to remove ethernet driver on
stratix10 platform, there will be warning trace as below:
$ cd /sys/class/net/eth0/device/driver/
$ echo ff800000.ethernet > unbind
WARNING: CPU: 3 PID: 386 at drivers/clk/clk.c:810 clk_core_unprepare+0x114/0x274
Modules linked in: sch_fq_codel
CPU: 3 PID: 386 Comm: sh Tainted: G W 5.10.74-yocto-standard #1
Hardware name: SoCFPGA Stratix 10 SoCDK (DT)
pstate: 00000005 (nzcv daif -PAN -UAO -TCO BTYPE=--)
pc : clk_core_unprepare+0x114/0x274
lr : clk_core_unprepare+0x114/0x274
sp : ffff800011bdbb10
clk_core_unprepare+0x114/0x274
clk_unprepare+0x38/0x50
stmmac_remove_config_dt+0x40/0x80
stmmac_pltfr_remove+0x64/0x80
platform_drv_remove+0x38/0x60
... ..
el0_sync_handler+0x1a4/0x1b0
el0_sync+0x180/0x1c0
This issue is introduced by introducing upstream commit 8f269102baf7
("net: stmmac: disable clocks in stmmac_remove_config_dt()")
But in latest mainline kernel, there is no this issue. Because commit
5ec55823438e("net: stmmac: add clocks management for gmac driver") and its
folowing fixing commits improved clocks management for stmmac driver.
Therefore, backport them to stable kernel v5.10.
Joakim Zhang (2):
net: stmmac: add clocks management for gmac driver
net: stmmac: fix system hang if change mac address after interface
ifdown
Michael Riesch (1):
net: stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings
Wei Yongjun (1):
net: stmmac: platform: fix build error with !CONFIG_PM_SLEEP
Wong Vee Khee (1):
net: stmmac: fix issue where clk is being unprepared twice
Yang Yingliang (1):
net: stmmac: fix missing unlock on error in stmmac_suspend()
.../net/ethernet/stmicro/stmmac/dwmac-rk.c | 9 --
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 1 +
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 87 ++++++++++++--
.../net/ethernet/stmicro/stmmac/stmmac_mdio.c | 111 ++++++++++++++----
.../ethernet/stmicro/stmmac/stmmac_platform.c | 30 ++++-
5 files changed, 187 insertions(+), 51 deletions(-)
--
2.17.1
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1ae43851b18afe861120ebd7c426dc44f06bb2bd Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Date: Thu, 16 Sep 2021 15:23:12 +0900
Subject: [PATCH] bootconfig: init: Fix memblock leak in xbc_make_cmdline()
Free unused memblock in a error case to fix memblock leak
in xbc_make_cmdline().
Link: https://lkml.kernel.org/r/163177339181.682366.8713781325929549256.stgit@dev…
Fixes: 51887d03aca1 ("bootconfig: init: Allow admin to use bootconfig for kernel command line")
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/init/main.c b/init/main.c
index 81a79a77db46..3c4054a95545 100644
--- a/init/main.c
+++ b/init/main.c
@@ -382,6 +382,7 @@ static char * __init xbc_make_cmdline(const char *key)
ret = xbc_snprint_cmdline(new_cmdline, len + 1, root);
if (ret < 0 || ret > len) {
pr_err("Failed to print extra kernel cmdline.\n");
+ memblock_free_ptr(new_cmdline, len + 1);
return NULL;
}
Hi,
This has triggered in 5.10.77 yesterday [1], and I was able to
reproduce it on 5.10.80 using the C repro from android-54 [2].
What happens is that the function do_mpage_readpage() calls
bdev_read_page() [3] passing in bdev == NULL, and bdev_read_page()
crashes here [4]. This happens in 5.15 down to 5.10, but it is fixed
in 5.16-rc1. I bisected it to the first good commit, which is:
af3c570fb0df ("loop: Use blk_validate_block_size() to validate block size")
The root cause seems to be loss of precision in loop_configure(),
when it calls loop_validate_block_size() in [5]. The config->block_size
is an uint32 and the bsize param passed to loop_validate_block_size() is
unsigned short. The reproducer sets up a loop device with the block size
equal to 0x20000400, which is bigger than USHRT_MAX.
The loop_validate_block_size() returns 0, but uses the invalid size
to setup the device. The new helper changes the bsize param type to uint,
and the issue goes away.
To fix this for the older kernels can we please have the two commits:
570b1cac4776 ("block: Add a helper to validate the block size")
af3c570fb0df ("loop: Use blk_validate_block_size() to validate block size")
applied to 5.15, 5.14, and 5.10.
The first one needs to be back ported, but the second applies cleanly.
I will follow up back ports for each version in few minutes.
--
Thanks,
Tadeusz
[1] https://syzkaller.appspot.com/bug?id=2a34ab9dad714959a3d2b60533acbd99094a5c…
[2] https://syzkaller.appspot.com/x/repro.c?x=13420a05900000
[3] https://elixir.bootlin.com/linux/v5.15/source/fs/mpage.c#L302
[4] https://elixir.bootlin.com/linux/v5.15/source/block/bdev.c#L323
[5] https://elixir.bootlin.com/linux/v5.15/source/drivers/block/loop.c#L1239
Hi Greg and stable team,
Here's a backport of relocation fixes that went into 5.16 aimed at the 5.15.x
series of stable kernels. It's a problem people are currently running into
when using btrfs on a zoned block device.
The following patches have been backported:
960a3166aed0 ("btrfs: zoned: allow preallocation for relocation inodes")
2adada886b26 ("btrfs: check for relocation inodes on zoned btrfs in should_nocow")
e6d261e3b1f7 ("btrfs: zoned: use regular writes for relocation")
35156d852762 ("btrfs: zoned: only allow one process to add pages to a relocation inode")
c2707a255623 ("btrfs: zoned: add a dedicated data relocation block group")
37f00a6d2e9c ("btrfs: introduce btrfs_is_data_reloc_root")
The backport has seen the usual regression testing with xfstests.
Johannes Thumshirn (6):
btrfs: introduce btrfs_is_data_reloc_root
btrfs: zoned: add a dedicated data relocation block group
btrfs: zoned: only allow one process to add pages to a relocation
inode
btrfs: zoned: use regular writes for relocation
btrfs: check for relocation inodes on zoned btrfs in should_nocow
btrfs: zoned: allow preallocation for relocation inodes
fs/btrfs/block-group.c | 1 +
fs/btrfs/ctree.h | 12 +++++++++
fs/btrfs/disk-io.c | 3 ++-
fs/btrfs/extent-tree.c | 56 +++++++++++++++++++++++++++++++++++++++---
fs/btrfs/extent_io.c | 11 +++++++++
fs/btrfs/inode.c | 29 +++++++++++++---------
fs/btrfs/relocation.c | 38 +++-------------------------
fs/btrfs/zoned.c | 21 ++++++++++++++++
fs/btrfs/zoned.h | 3 +++
9 files changed, 123 insertions(+), 51 deletions(-)
--
2.32.0
Hi Greg,
please apply commit 5c4e0a21fae8 ("string: uninline memcpy_and_pad")
to v5.15.y to avoid the following build error seen with gcc 11.x.
Building m68k:allmodconfig ... failed
--------------
Error log:
In file included from include/linux/string.h:20,
from include/linux/bitmap.h:10,
from include/linux/cpumask.h:12,
from include/linux/smp.h:13,
from include/linux/lockdep.h:14,
from include/linux/spinlock.h:63,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:6,
from include/linux/slab.h:15,
from drivers/nvme/target/discovery.c:7:
In function 'memcpy_and_pad',
inlined from 'nvmet_execute_disc_identify' at drivers/nvme/target/discovery.c:268:2:
arch/m68k/include/asm/string.h:72:25: error: '__builtin_memcpy' reading 8 bytes from a region of size 7
Thanks,
Guenter
Hi Paolo and David,
I have a strange compile error which appeared in v5.15.3:
CALL scripts/checksyscalls.sh - due to target missing
CALL scripts/atomic/check-atomics.sh - due to target missing
CHK include/generated/compile.h - due to compile.h not in $(targets)
CC arch/x86/kvm/x86.o - due to target missing
arch/x86/kvm/x86.c: Assembler messages:
arch/x86/kvm/x86.c:3241: Error: bad register name `%dil'
scripts/Makefile.build:277: recipe for target 'arch/x86/kvm/x86.o' failed
make[3]: *** [arch/x86/kvm/x86.o] Error 1
scripts/Makefile.build:540: recipe for target 'arch/x86/kvm' failed
make[2]: *** [arch/x86/kvm] Error 2
Makefile:1868: recipe for target 'arch/x86' failed
make[1]: *** [arch/x86] Error 2
Makefile:350: recipe for target '__build_one_by_one' failed
make: *** [__build_one_by_one] Error 2
My (cross-)compiler is a gcc 6.3.0 for 32 bit x86.
It is neither with v5.15.2 nor v5.16-rc1 nor v5.14.20.
The code line 3241 is:
asm volatile("1: xchgb %0, %2\n"
"xor %1, %1\n"
"2:\n"
_ASM_EXTABLE_UA(1b, 2b)
: "+r" (st_preempted),
"+&r" (err)
: "m" (st->preempted));
This seems to have been introduced by:
9d12bf19b278 KVM: x86: Fix recording of guest steal time / preempted status
but it is a backport of commit 7e2175ebd695f17860c5bd4ad7616cce12ed4591
which was also merged to 5.14.20.
So maybe the backport is incomplete or has some hidden dependency?
But only on the 5.15.y series?
BR and thanks,
Nikolaus Schaller
From: Stefano Stabellini <stefano.stabellini(a)xilinx.com>
If the xenstore page hasn't been allocated properly, reading the value
of the related hvm_param (HVM_PARAM_STORE_PFN) won't actually return
error. Instead, it will succeed and return zero.
Instead of attempting to xen_remap a bad guest physical address, detect
this condition and return early.
Note that although a guest physical address of zero for
HVM_PARAM_STORE_PFN is theoretically possible, it is not a good choice
and zero has never been validly used in that capacity.
Cc: stable(a)vger.kernel.org
Signed-off-by: Stefano Stabellini <stefano.stabellini(a)xilinx.com>
---
drivers/xen/xenbus/xenbus_probe.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 94405bb3829e..c89de0062399 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -951,6 +951,18 @@ static int __init xenbus_init(void)
err = hvm_get_parameter(HVM_PARAM_STORE_PFN, &v);
if (err)
goto out_error;
+ /*
+ * Uninitialized hvm_params are zero and return no error.
+ * Although it is theoretically possible to have
+ * HVM_PARAM_STORE_PFN set to zero on purpose, in reality it is
+ * not zero when valid. If zero, it means that Xenstore hasn't
+ * been properly initialized. Instead of attempting to map a
+ * wrong guest physical address return error.
+ */
+ if (v == 0) {
+ err = -ENOENT;
+ goto out_error;
+ }
xen_store_gfn = (unsigned long)v;
xen_store_interface =
xen_remap(xen_store_gfn << XEN_PAGE_SHIFT,
--
2.25.1
From: Alexander Sverdlin <alexander.sverdlin(a)nokia.com>
Erase can be zeroed in spi_nor_parse_4bait() or
spi_nor_init_non_uniform_erase_map(). In practice it happened with
mt25qu256a, which supports 4K, 32K, 64K erases with 3b address commands,
but only 4K and 64K erase with 4b address commands.
Fixes: dc92843159a7 ("mtd: spi-nor: fix erase_type array to indicate current map conf")
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)nokia.com>
---
drivers/mtd/spi-nor/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index 88dd090..183ea9d 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -1400,6 +1400,8 @@ spi_nor_find_best_erase_type(const struct spi_nor_erase_map *map,
continue;
erase = &map->erase_type[i];
+ if (!erase->opcode)
+ continue;
/* Alignment is not mandatory for overlaid regions */
if (region->offset & SNOR_OVERLAID_REGION &&
--
2.10.2
The patch titled
Subject: hugetlb, userfaultfd: fix reservation restore on userfaultfd error
has been added to the -mm tree. Its filename is
hugetlb-userfaultfd-fix-reservation-restore-on-userfaultfd-error.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/hugetlb-userfaultfd-fix-reservati…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/hugetlb-userfaultfd-fix-reservati…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mina Almasry <almasrymina(a)google.com>
Subject: hugetlb, userfaultfd: fix reservation restore on userfaultfd error
Currently in the is_continue case in hugetlb_mcopy_atomic_pte(), if we
bail out using "goto out_release_unlock;" in the cases where idx >= size,
or !huge_pte_none(), the code will detect that new_pagecache_page ==
false, and so call restore_reserve_on_error(). In this case I see
restore_reserve_on_error() delete the reservation, and the following call
to remove_inode_hugepages() will increment h->resv_hugepages causing a
100% reproducible leak.
We should treat the is_continue case similar to adding a page into the
pagecache and set new_pagecache_page to true, to indicate that there is no
reservation to restore on the error path, and we need not call
restore_reserve_on_error(). Rename new_pagecache_page to
page_in_pagecache to make that clear.
Link: https://lkml.kernel.org/r/20211117193825.378528-1-almasrymina@google.com
Fixes: c7b1850dfb41 ("hugetlb: don't pass page cache pages to restore_reserve_on_error")
Signed-off-by: Mina Almasry <almasrymina(a)google.com>
Reported-by: James Houghton <jthoughton(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/mm/hugetlb.c~hugetlb-userfaultfd-fix-reservation-restore-on-userfaultfd-error
+++ a/mm/hugetlb.c
@@ -5734,13 +5734,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_s
int ret = -ENOMEM;
struct page *page;
int writable;
- bool new_pagecache_page = false;
+ bool page_in_pagecache = false;
if (is_continue) {
ret = -EFAULT;
page = find_lock_page(mapping, idx);
if (!page)
goto out;
+ page_in_pagecache = true;
} else if (!*pagep) {
/* If a page already exists, then it's UFFDIO_COPY for
* a non-missing case. Return -EEXIST.
@@ -5828,7 +5829,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_s
ret = huge_add_to_page_cache(page, mapping, idx);
if (ret)
goto out_release_nounlock;
- new_pagecache_page = true;
+ page_in_pagecache = true;
}
ptl = huge_pte_lockptr(h, dst_mm, dst_pte);
@@ -5892,7 +5893,7 @@ out_release_unlock:
if (vm_shared || is_continue)
unlock_page(page);
out_release_nounlock:
- if (!new_pagecache_page)
+ if (!page_in_pagecache)
restore_reserve_on_error(h, dst_vma, dst_addr, page);
put_page(page);
goto out;
_
Patches currently in -mm which might be from almasrymina(a)google.com are
hugetlb-userfaultfd-fix-reservation-restore-on-userfaultfd-error.patch
The patch titled
Subject: hugetlb, userfaultfd: fix reservation restore on userfaultfd error
has been added to the -mm tree. Its filename is
hugetlb-userfaultfd-fix-reservation-restore-on-userfaultfd-error.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/hugetlb-userfaultfd-fix-reservati…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/hugetlb-userfaultfd-fix-reservati…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mina Almasry <almasrymina(a)google.com>
Subject: hugetlb, userfaultfd: fix reservation restore on userfaultfd error
Currently in the is_continue case in hugetlb_mcopy_atomic_pte(), if we
bail out using "goto out_release_unlock;" in the cases where idx >= size,
or !huge_pte_none(), the code will detect that new_pagecache_page ==
false, and so call restore_reserve_on_error(). In this case I see
restore_reserve_on_error() delete the reservation, and the following call
to remove_inode_hugepages() will increment h->resv_hugepages causing a
100% reproducible leak.
We should treat the is_continue case similar to adding a page into the
pagecache and set new_pagecache_page to true, to indicate that there is no
reservation to restore on the error path, and we need not call
restore_reserve_on_error(). Rename new_pagecache_page to
page_in_pagecache to make that clear.
Link: https://lkml.kernel.org/r/20211117193825.378528-1-almasrymina@google.com
Fixes: c7b1850dfb41 ("hugetlb: don't pass page cache pages to restore_reserve_on_error")
Signed-off-by: Mina Almasry <almasrymina(a)google.com>
Reported-by: James Houghton <jthoughton(a)google.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/mm/hugetlb.c~hugetlb-userfaultfd-fix-reservation-restore-on-userfaultfd-error
+++ a/mm/hugetlb.c
@@ -5734,13 +5734,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_s
int ret = -ENOMEM;
struct page *page;
int writable;
- bool new_pagecache_page = false;
+ bool page_in_pagecache = false;
if (is_continue) {
ret = -EFAULT;
page = find_lock_page(mapping, idx);
if (!page)
goto out;
+ page_in_pagecache = true;
} else if (!*pagep) {
/* If a page already exists, then it's UFFDIO_COPY for
* a non-missing case. Return -EEXIST.
@@ -5828,7 +5829,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_s
ret = huge_add_to_page_cache(page, mapping, idx);
if (ret)
goto out_release_nounlock;
- new_pagecache_page = true;
+ page_in_pagecache = true;
}
ptl = huge_pte_lockptr(h, dst_mm, dst_pte);
@@ -5892,7 +5893,7 @@ out_release_unlock:
if (vm_shared || is_continue)
unlock_page(page);
out_release_nounlock:
- if (!new_pagecache_page)
+ if (!page_in_pagecache)
restore_reserve_on_error(h, dst_vma, dst_addr, page);
put_page(page);
goto out;
_
Patches currently in -mm which might be from almasrymina(a)google.com are
hugetlb-userfaultfd-fix-reservation-restore-on-userfaultfd-error.patch
The patch titled
Subject: hugetlb, userfaultfd: fix reservation restore on userfaultfd error
has been removed from the -mm tree. Its filename was
hugetlb-userfaultfd-fix-reservation-restore-on-userfaultfd-error.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Mina Almasry <almasrymina(a)google.com>
Subject: hugetlb, userfaultfd: fix reservation restore on userfaultfd error
Currently in the is_continue case in hugetlb_mcopy_atomic_pte(), if we
bail out using "goto out_release_unlock;" in the cases where idx >= size,
or !huge_pte_none(), the code will detect that new_pagecache_page ==
false, and so call restore_reserve_on_error(). In this case I see
restore_reserve_on_error() delete the reservation, and the following call
to remove_inode_hugepages() will increment h->resv_hugepages causing a
100% reproducible leak.
We should treat the is_continue case similar to adding a page into the
pagecache and set new_pagecache_page to true, to indicate that there is no
reservation to restore on the error path, and we need not call
restore_reserve_on_error().
Link: https://lkml.kernel.org/r/20211116235733.3774702-1-almasrymina@google.com
Fixes: c7b1850dfb41 ("hugetlb: don't pass page cache pages to restore_reserve_on_error")
Signed-off-by: Mina Almasry <almasrymina(a)google.com>
Reported-by: James Houghton <jthoughton(a)google.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/mm/hugetlb.c~hugetlb-userfaultfd-fix-reservation-restore-on-userfaultfd-error
+++ a/mm/hugetlb.c
@@ -5743,6 +5743,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_s
page = find_lock_page(mapping, idx);
if (!page)
goto out;
+ /*
+ * Set new_pagecache_page to true, as we've added a page to the
+ * pagecache, but userfaultfd hasn't set up a mapping for this
+ * page yet. If we bail out before setting up the mapping, we
+ * want to indicate to restore_reserve_on_error() that we've
+ * added the page to the page cache.
+ */
+ new_pagecache_page = true;
} else if (!*pagep) {
/* If a page already exists, then it's UFFDIO_COPY for
* a non-missing case. Return -EEXIST.
_
Patches currently in -mm which might be from almasrymina(a)google.com are
The IRTE for an assigned device can trigger a POSTED_INTR_VECTOR even
if APICv is disabled on the vCPU that receives it. In that case, the
interrupt will just cause a vmexit and leave the ON bit set together
with the PIR bit corresponding to the interrupt.
Right now, the interrupt would not be delivered until APICv is re-enabled.
However, fixing this is just a matter of always doing the PIR->IRR
synchronization, even if the vCPU has temporarily disabled APICv.
This is not a problem for performance, or if anything it is an
improvement. First, in the common case where vcpu->arch.apicv_active is
true, one fewer check has to be performed. Second, static_call_cond will
elide the function call if APICv is not present or disabled. Finally,
in the case for AMD hardware we can remove the sync_pir_to_irr callback:
it is only needed for apic_has_interrupt_for_ppr, and that function
already has a fallback for !APICv.
Cc: stable(a)vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/lapic.c | 2 +-
arch/x86/kvm/svm/svm.c | 1 -
arch/x86/kvm/x86.c | 18 +++++++++---------
3 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 759952dd1222..f206fc35deff 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -707,7 +707,7 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
static int apic_has_interrupt_for_ppr(struct kvm_lapic *apic, u32 ppr)
{
int highest_irr;
- if (apic->vcpu->arch.apicv_active)
+ if (kvm_x86_ops.sync_pir_to_irr)
highest_irr = static_call(kvm_x86_sync_pir_to_irr)(apic->vcpu);
else
highest_irr = apic_find_highest_irr(apic);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5630c241d5f6..d0f68d11ec70 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4651,7 +4651,6 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_irr_update = svm_hwapic_irr_update,
.hwapic_isr_update = svm_hwapic_isr_update,
- .sync_pir_to_irr = kvm_lapic_find_highest_irr,
.apicv_post_state_restore = avic_post_state_restore,
.set_tss_addr = svm_set_tss_addr,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 627c955101a0..a8f12c83db4b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4448,8 +4448,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
struct kvm_lapic_state *s)
{
- if (vcpu->arch.apicv_active)
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
return kvm_apic_get_state(vcpu, s);
}
@@ -9528,8 +9527,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
if (irqchip_split(vcpu->kvm))
kvm_scan_ioapic_routes(vcpu, vcpu->arch.ioapic_handled_vectors);
else {
- if (vcpu->arch.apicv_active)
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (ioapic_in_kernel(vcpu->kvm))
kvm_ioapic_scan_entry(vcpu, vcpu->arch.ioapic_handled_vectors);
}
@@ -9802,10 +9800,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
/*
* This handles the case where a posted interrupt was
- * notified with kvm_vcpu_kick.
+ * notified with kvm_vcpu_kick. Assigned devices can
+ * use the POSTED_INTR_VECTOR even if APICv is disabled,
+ * so do it even if !kvm_vcpu_apicv_active(vcpu).
*/
- if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active)
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ if (kvm_lapic_enabled(vcpu))
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (kvm_vcpu_exit_request(vcpu)) {
vcpu->mode = OUTSIDE_GUEST_MODE;
@@ -9849,8 +9849,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))
break;
- if (kvm_lapic_enabled(vcpu) && kvm->arch.apicv_active)
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ if (kvm_lapic_enabled(vcpu))
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (unlikely(kvm_vcpu_exit_request(vcpu))) {
exit_fastpath = EXIT_FASTPATH_EXIT_HANDLED;
--
2.27.0
From: Zhengjun Xing <zhengjun.xing(a)linux.intel.com>
The user recently report a perf issue in the ICX platform, when test by
perf event “uncore_imc_x/cas_count_write”,the write bandwidth is always
very small (only 0.38MB/s), it is caused by the wrong "umask" for the
"cas_count_write" event. When double-checking, find "cas_count_read"
also is wrong.
The public document for ICX uncore:
https://www.intel.com/content/www/us/en/develop/download/3rd-gen-intel-xeon…
On page 142, Table 2-143, defines Unit Masks for CAS_COUNT:
RD b00001111
WR b00110000
So Corrected both "cas_count_read" and "cas_count_write" for ICX.
Old settings:
hswep_uncore_imc_events
INTEL_UNCORE_EVENT_DESC(cas_count_read, "event=0x04,umask=0x03")
INTEL_UNCORE_EVENT_DESC(cas_count_write, "event=0x04,umask=0x0c")
New settings:
snr_uncore_imc_events
INTEL_UNCORE_EVENT_DESC(cas_count_read, "event=0x04,umask=0x0f")
INTEL_UNCORE_EVENT_DESC(cas_count_write, "event=0x04,umask=0x30"),
Fixes: 2b3b76b5ec67 ("perf/x86/intel/uncore: Add Ice Lake server uncore support")
Reviewed-by: Adrian Hunter <adrian.hunter(a)intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/events/intel/uncore_snbep.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index 5ddc0f30db6f..a6fd8eb410a9 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -5468,7 +5468,7 @@ static struct intel_uncore_type icx_uncore_imc = {
.fixed_ctr_bits = 48,
.fixed_ctr = SNR_IMC_MMIO_PMON_FIXED_CTR,
.fixed_ctl = SNR_IMC_MMIO_PMON_FIXED_CTL,
- .event_descs = hswep_uncore_imc_events,
+ .event_descs = snr_uncore_imc_events,
.perf_ctr = SNR_IMC_MMIO_PMON_CTR0,
.event_ctl = SNR_IMC_MMIO_PMON_CTL0,
.event_mask = SNBEP_PMON_RAW_EVENT_MASK,
--
2.25.1
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 5c7df80e2ce4c954c80eb4ecf5fa002a5ff5d2d6
Gitweb: https://git.kernel.org/tip/5c7df80e2ce4c954c80eb4ecf5fa002a5ff5d2d6
Author: Sean Christopherson <seanjc(a)google.com>
AuthorDate: Thu, 11 Nov 2021 02:07:23
Committer: Peter Zijlstra <peterz(a)infradead.org>
CommitterDate: Wed, 17 Nov 2021 14:49:06 +01:00
KVM: x86: Register perf callbacks after calling vendor's hardware_setup()
Wait to register perf callbacks until after doing vendor hardaware setup.
VMX's hardware_setup() configures Intel Processor Trace (PT) mode, and a
future fix to register the Intel PT guest interrupt hook if and only if
Intel PT is exposed to the guest will consume the configured PT mode.
Delaying registration to hardware setup is effectively a nop as KVM's perf
hooks all pivot on the per-CPU current_vcpu, which is non-NULL only when
KVM is handling an IRQ/NMI in a VM-Exit path. I.e. current_vcpu will be
NULL throughout both kvm_arch_init() and kvm_arch_hardware_setup().
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Acked-by: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20211111020738.2512932-3-seanjc@google.com
---
arch/x86/kvm/x86.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dc7eb5f..50f0cd1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8626,8 +8626,6 @@ int kvm_arch_init(void *opaque)
kvm_timer_init();
- perf_register_guest_info_callbacks(&kvm_guest_cbs);
-
if (boot_cpu_has(X86_FEATURE_XSAVE)) {
host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
@@ -8659,7 +8657,6 @@ void kvm_arch_exit(void)
clear_hv_tscchange_cb();
#endif
kvm_lapic_exit();
- perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block,
@@ -11225,6 +11222,8 @@ int kvm_arch_hardware_setup(void *opaque)
memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
kvm_ops_static_call_update();
+ perf_register_guest_info_callbacks(&kvm_guest_cbs);
+
if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
supported_xss = 0;
@@ -11252,6 +11251,8 @@ int kvm_arch_hardware_setup(void *opaque)
void kvm_arch_hardware_unsetup(void)
{
+ perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
+
static_call(kvm_x86_hardware_unsetup)();
}
From: Orson Zhai <orson.zhai(a)unisoc.com>
Hi Greg,
Change v1->v2:
- Remove Change-id in commit message.
- Fix build error for one struct member missing.
I am sorry for my careless about not testing for build before submitting.
-----
Following 2 patches were merged into 5.10.y but not in 5.4.y.
We've found kernel crashes on our devices with 5.4 stable caused by missing them.
Please feel free to add them into the stable queue for 5.4.y if no issue.
Thanks,
Orson
Adrian Hunter (1):
scsi: ufs: Fix interrupt error message for shared interrupts
Jaegeuk Kim (1):
scsi: ufs: Fix tm request when non-fatal error happens
drivers/scsi/ufs/ufshcd.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
--
2.7.4
Hello everyone,
The following new device USB ID has landed in linux-next recently:
4fd6d4907961 ("Bluetooth: btusb: Add support for TP-Link UB500 Adapter")
It would be nice if it could be backported to stable. I verified it
works on 5.14.y as a simple cherry-pick .
Thank you
Dzień dobry,
dostrzegam możliwość współpracy z Państwa firmą.
Świadczymy kompleksową obsługę inwestycji w fotowoltaikę, która obniża koszty energii elektrycznej nawet o 90%.
Czy są Państwo zainteresowani weryfikacją wstępnych propozycji?
Pozdrawiam,
Miłosz Nowak
commit 39fec6889d15a658c3a3ebb06fd69d3584ddffd3 upstream.
Ext4 file system has default lazy inode table initialization setup once
it is mounted. However, it has issue on computing the next schedule time
that makes the timeout same amount in jiffies but different real time in
secs if with various HZ values. Therefore, fix by measuring the current
time in a more granular unit nanoseconds and make the next schedule time
independent of the HZ value.
Fixes: bfff68738f1c ("ext4: add support for lazy inode table initialization")
Signed-off-by: Shaoying Xu <shaoyi(a)amazon.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Link: https://lore.kernel.org/r/20210902164412.9994-2-shaoyi@amazon.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
---
Member lr_sbi was removed from the struct ext4_li_request since kernel 5.9
so the way to access s_li_wait_mult was also changed. To adapt to the old
kernel versions, adjust the upstream fix by following the old ext4_li_request
strucutre.
fs/ext4/super.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 1211ae203fac..f68dfef5939f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3071,8 +3071,8 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
struct ext4_group_desc *gdp = NULL;
ext4_group_t group, ngroups;
struct super_block *sb;
- unsigned long timeout = 0;
int ret = 0;
+ u64 start_time;
sb = elr->lr_super;
ngroups = EXT4_SB(sb)->s_groups_count;
@@ -3092,13 +3092,12 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
ret = 1;
if (!ret) {
- timeout = jiffies;
+ start_time = ktime_get_real_ns();
ret = ext4_init_inode_table(sb, group,
elr->lr_timeout ? 0 : 1);
if (elr->lr_timeout == 0) {
- timeout = (jiffies - timeout) *
- elr->lr_sbi->s_li_wait_mult;
- elr->lr_timeout = timeout;
+ elr->lr_timeout = nsecs_to_jiffies((ktime_get_real_ns() - start_time) *
+ elr->lr_sbi->s_li_wait_mult);
}
elr->lr_next_sched = jiffies + elr->lr_timeout;
elr->lr_next_group = group + 1;
--
2.16.6
From: Zhengjun Xing <zhengjun.xing(a)linux.intel.com>
The user recently report a perf issue in the ICX platform, when test by
perf event “uncore_imc_x/cas_count_write”,the write bandwidth is always
very small (only 0.38MB/s), it is caused by the wrong "umask" for the
"cas_count_write" event. When double-checking, find "cas_count_read"
also is wrong.
The public document for ICX uncore:
https://www.intel.com/content/www/us/en/develop/download/3rd-gen-intel-xeon…
On page 142, Table 2-143, defines Unit Masks for CAS_COUNT:
RD b00001111
WR b00110000
So Corrected both "cas_count_read" and "cas_count_write" for ICX.
Old settings:
hswep_uncore_imc_events
INTEL_UNCORE_EVENT_DESC(cas_count_read, "event=0x04,umask=0x03")
INTEL_UNCORE_EVENT_DESC(cas_count_write, "event=0x04,umask=0x0c")
New settings:
snr_uncore_imc_events
INTEL_UNCORE_EVENT_DESC(cas_count_read, "event=0x04,umask=0x0f")
INTEL_UNCORE_EVENT_DESC(cas_count_write, "event=0x04,umask=0x30"),
Fixes: 2b3b76b5ec67 ("perf/x86/intel/uncore: Add Ice Lake server uncore support")
Reviewed-by: Adrian Hunter <adrian.hunter(a)intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing(a)linux.intel.com>
---
arch/x86/events/intel/uncore_snbep.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index 5ddc0f30db6f..a6fd8eb410a9 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -5468,7 +5468,7 @@ static struct intel_uncore_type icx_uncore_imc = {
.fixed_ctr_bits = 48,
.fixed_ctr = SNR_IMC_MMIO_PMON_FIXED_CTR,
.fixed_ctl = SNR_IMC_MMIO_PMON_FIXED_CTL,
- .event_descs = hswep_uncore_imc_events,
+ .event_descs = snr_uncore_imc_events,
.perf_ctr = SNR_IMC_MMIO_PMON_CTR0,
.event_ctl = SNR_IMC_MMIO_PMON_CTL0,
.event_mask = SNBEP_PMON_RAW_EVENT_MASK,
--
2.25.1
There is no reason for shutting down MHI ungracefully on freeze,
this causes the MHI host stack & device stack to not be aligned
anymore since the proper MHI reset sequence is not performed for
ungraceful shutdown.
Cc: stable(a)vger.kernel.org
Fixes: 5f0c2ee1fe8d ("bus: mhi: pci-generic: Fix hibernation")
Suggested-by: Bhaumik Bhatt <bbhatt(a)codeaurora.org>
Signed-off-by: Loic Poulain <loic.poulain(a)linaro.org>
Reviewed-by: Bhaumik Bhatt <bbhatt(a)codeaurora.org>
Reviewed-by: Hemant Kumar <hemantk(a)codeaurora.org>
---
v2: Forgot to mention this change comes from a Bhaumik suggestion
drivers/bus/mhi/pci_generic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
index 6a42425..d4a3ce2 100644
--- a/drivers/bus/mhi/pci_generic.c
+++ b/drivers/bus/mhi/pci_generic.c
@@ -1018,7 +1018,7 @@ static int __maybe_unused mhi_pci_freeze(struct device *dev)
* context.
*/
if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
- mhi_power_down(mhi_cntrl, false);
+ mhi_power_down(mhi_cntrl, true);
mhi_unprepare_after_power_down(mhi_cntrl);
}
--
2.7.4
Some devices tend to trigger SYS_ERR interrupt while the host handling
SYS_ERR state of the device during power up. This creates a race
condition and causes a failure in booting up the device.
The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR
handling in mhi_async_power_up(). Once the host detects that the device
is in SYS_ERR state, it issues MHI_RESET and waits for the device to
process the reset request. During this time, the device triggers SYS_ERR
interrupt to the host and host starts handling SYS_ERR execution.
So by the time the device has completed reset, host starts SYS_ERR
handling. This causes the race condition and the modem fails to boot.
Hence, register the IRQ handler only after handling the SYS_ERR check
to avoid getting spurious IRQs from the device.
Cc: stable(a)vger.kernel.org
Fixes: e18d4e9fa79b ("bus: mhi: core: Handle syserr during power_up")
Reported-by: Aleksander Morgado <aleksander(a)aleksander.es>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
Changes in v2:
* Switched to "mhi_poll_reg_field" for detecting MHI reset in device.
drivers/bus/mhi/core/pm.c | 32 ++++++++++----------------------
1 file changed, 10 insertions(+), 22 deletions(-)
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index fb99e3727155..3c347fe9b10d 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -1038,7 +1038,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
enum mhi_ee_type current_ee;
enum dev_st_transition next_state;
struct device *dev = &mhi_cntrl->mhi_dev->dev;
- u32 val;
int ret;
dev_info(dev, "Requested to power ON\n");
@@ -1055,10 +1054,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
mutex_lock(&mhi_cntrl->pm_mutex);
mhi_cntrl->pm_state = MHI_PM_DISABLE;
- ret = mhi_init_irq_setup(mhi_cntrl);
- if (ret)
- goto error_setup_irq;
-
/* Setup BHI INTVEC */
write_lock_irq(&mhi_cntrl->pm_lock);
mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
@@ -1072,7 +1067,7 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
dev_err(dev, "%s is not a valid EE for power on\n",
TO_MHI_EXEC_STR(current_ee));
ret = -EIO;
- goto error_async_power_up;
+ goto error_setup_irq;
}
state = mhi_get_mhi_state(mhi_cntrl);
@@ -1081,20 +1076,12 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
if (state == MHI_STATE_SYS_ERR) {
mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);
- ret = wait_event_timeout(mhi_cntrl->state_event,
- MHI_PM_IN_FATAL_STATE(mhi_cntrl->pm_state) ||
- mhi_read_reg_field(mhi_cntrl,
- mhi_cntrl->regs,
- MHICTRL,
- MHICTRL_RESET_MASK,
- MHICTRL_RESET_SHIFT,
- &val) ||
- !val,
- msecs_to_jiffies(mhi_cntrl->timeout_ms));
- if (!ret) {
- ret = -EIO;
+ ret = mhi_poll_reg_field(mhi_cntrl, mhi_cntrl->regs, MHICTRL,
+ MHICTRL_RESET_MASK, MHICTRL_RESET_SHIFT, 0,
+ msecs_to_jiffies(mhi_cntrl->timeout_ms));
+ if (ret) {
dev_info(dev, "Failed to reset MHI due to syserr state\n");
- goto error_async_power_up;
+ goto error_setup_irq;
}
/*
@@ -1104,6 +1091,10 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
}
+ ret = mhi_init_irq_setup(mhi_cntrl);
+ if (ret)
+ goto error_setup_irq;
+
/* Transition to next state */
next_state = MHI_IN_PBL(current_ee) ?
DEV_ST_TRANSITION_PBL : DEV_ST_TRANSITION_READY;
@@ -1116,9 +1107,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
return 0;
-error_async_power_up:
- mhi_deinit_free_irq(mhi_cntrl);
-
error_setup_irq:
mhi_cntrl->pm_state = MHI_PM_DISABLE;
mutex_unlock(&mhi_cntrl->pm_mutex);
--
2.25.1
kmemdup can return a null pointer so need to check for it, otherwise
the null key will be dereferenced later in tipc_crypto_key_xmit as
can be seen in the trace [1].
Cc: Jon Maloy <jmaloy(a)redhat.com>
Cc: Ying Xue <ying.xue(a)windriver.com>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: netdev(a)vger.kernel.org
Cc: tipc-discussion(a)lists.sourceforge.net
Cc: linux-kernel(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # 5.15, 5.14, 5.10
[1] https://syzkaller.appspot.com/bug?id=bca180abb29567b189efdbdb34cbf7ba851c2a…
Reported-by: Dmitry Vyukov <dvyukov(a)google.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk(a)linaro.org>
---
Changed in v2:
- use tipc_aead_free() to free all crytpo tfm instances
that might have been allocated before the fail.
---
net/tipc/crypto.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/tipc/crypto.c b/net/tipc/crypto.c
index dc60c32bb70d..d293614d5fc6 100644
--- a/net/tipc/crypto.c
+++ b/net/tipc/crypto.c
@@ -597,6 +597,10 @@ static int tipc_aead_init(struct tipc_aead **aead, struct tipc_aead_key *ukey,
tmp->cloned = NULL;
tmp->authsize = TIPC_AES_GCM_TAG_SIZE;
tmp->key = kmemdup(ukey, tipc_aead_key_size(ukey), GFP_KERNEL);
+ if (!tmp->key) {
+ tipc_aead_free(&tmp->rcu);
+ return -ENOMEM;
+ }
memcpy(&tmp->salt, ukey->key + keylen, TIPC_AES_GCM_SALT_SIZE);
atomic_set(&tmp->users, 0);
atomic64_set(&tmp->seqno, 0);
--
2.33.1
After fixing the handling of POSTED_INTR_WAKEUP_VECTOR for vCPUs with
disabled APICv, take care of POSTED_INTR_VECTOR. The IRTE for an assigned
device can trigger a POSTED_INTR_VECTOR even if APICv is disabled on the
vCPU that receives it. In that case, the interrupt will just cause a
vmexit and leave the ON bit set together with the PIR bit corresponding
to the interrupt.
Right now, the interrupt would not be delivered until APICv is re-enabled.
However, fixing this is just a matter of always doing the PIR->IRR
synchronization, even if the vCPU does not have APICv enabled.
This is not a problem for performance, or if anything it is an
improvement. static_call_cond will elide the function call if APICv is
not present or disabled, or if (as is the case for AMD hardware) it does
not require a sync_pir_to_irr callback. And in the common case where
kvm_vcpu_apicv_active(vcpu) is true, one fewer check has to be performed.
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/x86.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dcefb1485362..eda86378dcff 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4445,8 +4445,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
struct kvm_lapic_state *s)
{
- if (kvm_vcpu_apicv_active(vcpu))
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
return kvm_apic_get_state(vcpu, s);
}
@@ -9645,8 +9644,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
if (irqchip_split(vcpu->kvm))
kvm_scan_ioapic_routes(vcpu, vcpu->arch.ioapic_handled_vectors);
else {
- if (kvm_vcpu_apicv_active(vcpu))
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (ioapic_in_kernel(vcpu->kvm))
kvm_ioapic_scan_entry(vcpu, vcpu->arch.ioapic_handled_vectors);
}
@@ -9919,10 +9917,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
/*
* This handles the case where a posted interrupt was
- * notified with kvm_vcpu_kick.
+ * notified with kvm_vcpu_kick. Assigned devices can
+ * use the POSTED_INTR_VECTOR even if APICv is disabled,
+ * so do it even if !kvm_vcpu_apicv_active(vcpu).
*/
- if (kvm_lapic_enabled(vcpu) && kvm_vcpu_apicv_active(vcpu))
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ if (kvm_lapic_enabled(vcpu))
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (kvm_vcpu_exit_request(vcpu)) {
vcpu->mode = OUTSIDE_GUEST_MODE;
--
2.27.0
Currently in the is_continue case in hugetlb_mcopy_atomic_pte(), if we
bail out using "goto out_release_unlock;" in the cases where idx >=
size, or !huge_pte_none(), the code will detect that new_pagecache_page
== false, and so call restore_reserve_on_error().
In this case I see restore_reserve_on_error() delete the reservation,
and the following call to remove_inode_hugepages() will increment
h->resv_hugepages causing a 100% reproducible leak.
We should treat the is_continue case similar to adding a page into the
pagecache and set new_pagecache_page to true, to indicate that there is
no reservation to restore on the error path, and we need not call
restore_reserve_on_error(). Rename new_pagecache_page to
page_in_pagecache to make that clear.
Cc: Wei Xu <weixugc(a)google.com>
Cc: stable(a)vger.kernel.org
Fixes: c7b1850dfb41 ("hugetlb: don't pass page cache pages to restore_reserve_on_error")
Signed-off-by: Mina Almasry <almasrymina(a)google.com>
Reported-by: James Houghton <jthoughton(a)google.com>
---
Changes in v2:
- Renamed new_pagecache_page to page_in_pagecache
- Removed unnecessary comment after the name update.
- Cc: stable
---
mm/hugetlb.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e09159c957e3..e7ebc4b355cf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5734,13 +5734,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
int ret = -ENOMEM;
struct page *page;
int writable;
- bool new_pagecache_page = false;
+ bool page_in_pagecache = false;
if (is_continue) {
ret = -EFAULT;
page = find_lock_page(mapping, idx);
if (!page)
goto out;
+ page_in_pagecache = true;
} else if (!*pagep) {
/* If a page already exists, then it's UFFDIO_COPY for
* a non-missing case. Return -EEXIST.
@@ -5828,7 +5829,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
ret = huge_add_to_page_cache(page, mapping, idx);
if (ret)
goto out_release_nounlock;
- new_pagecache_page = true;
+ page_in_pagecache = true;
}
ptl = huge_pte_lockptr(h, dst_mm, dst_pte);
@@ -5892,7 +5893,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
if (vm_shared || is_continue)
unlock_page(page);
out_release_nounlock:
- if (!new_pagecache_page)
+ if (!page_in_pagecache)
restore_reserve_on_error(h, dst_vma, dst_addr, page);
put_page(page);
goto out;
--
2.34.0.rc2.393.gf8c9666880-goog
Hi Greg et al,
Please include the following mainline commit:
commit d3c4b6f64ad356c0d9ddbcf73fa471e6a841cc5c
Author: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Date: Wed Sep 29 18:31:25 2021 +0200
ACPICA: Avoid evaluating methods too early during system resume
into all applicable -stable series.
It fixes resume from suspend-to-RAM on multiple systems.
Thanks!
From: Corentin Labbe <clabbe.montjoie(a)gmail.com>
commit 60f786525032432af1b7d9b8935cb12936244ccd upstream
mdio_mux_uninit() call put_device (unconditionally) because of
of_mdio_find_bus() in mdio_mux_init.
But of_mdio_find_bus is only called if mux_bus is empty.
If mux_bus is set, mdio_mux_uninit will print a "refcount_t: underflow"
trace.
This patch add a get_device in the other branch of "if (mux_bus)".
Signed-off-by: Corentin Labbe <clabbe.montjoie(a)gmail.com>
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Florian Fainelli <f.fainelli(a)gmail.com>
---
Note: this patch did not get any fixes tag, but it does fix issues
introduced by fdf3b78df4d2 ("mdio: mux: Correct mdio_mux_init error
path issues").
Thanks!
drivers/net/phy/mdio-mux.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/phy/mdio-mux.c b/drivers/net/phy/mdio-mux.c
index 599ce24c514f..456b64248e5d 100644
--- a/drivers/net/phy/mdio-mux.c
+++ b/drivers/net/phy/mdio-mux.c
@@ -117,6 +117,7 @@ int mdio_mux_init(struct device *dev,
} else {
parent_bus_node = NULL;
parent_bus = mux_bus;
+ get_device(&parent_bus->dev);
}
pb = devm_kzalloc(dev, sizeof(*pb), GFP_KERNEL);
@@ -182,9 +183,7 @@ int mdio_mux_init(struct device *dev,
devm_kfree(dev, pb);
err_pb_kz:
- /* balance the reference of_mdio_find_bus() took */
- if (!mux_bus)
- put_device(&parent_bus->dev);
+ put_device(&parent_bus->dev);
err_parent_bus:
of_node_put(parent_bus_node);
return ret_val;
@@ -202,7 +201,6 @@ void mdio_mux_uninit(void *mux_handle)
cb = cb->next;
}
- /* balance the reference of_mdio_find_bus() in mdio_mux_init() took */
put_device(&pb->mii_bus->dev);
}
EXPORT_SYMBOL_GPL(mdio_mux_uninit);
--
2.25.1
Fix assembly errors like:
{standard input}: Assembler messages:
{standard input}:287: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32'
{standard input}:680: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32'
{standard input}:1274: Error: opcode not supported on this processor: mips3 (mips3) `dins $12,$9,32,32'
{standard input}:2175: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32'
make[1]: *** [scripts/Makefile.build:277: mm/highmem.o] Error 1
with code produced from `__cmpxchg64' for MIPS64r2 CPU configurations
using CONFIG_32BIT and CONFIG_PHYS_ADDR_T_64BIT.
This is due to MIPS_ISA_ARCH_LEVEL downgrading the assembly architecture
to `r4000' i.e. MIPS III for MIPS64r2 configurations, while there is a
block of code containing a DINS MIPS64r2 instruction conditionalized on
MIPS_ISA_REV >= 2 within the scope of the downgrade.
The assembly architecture override code pattern has been put there for
LL/SC instructions, so that code compiles for configurations that select
a processor to build for that does not support these instructions while
still providing run-time support for processors that do, dynamically
switched by non-constant `cpu_has_llsc'. It went in with linux-mips.org
commit aac8aa7717a2 ("Enable a suitable ISA for the assembler around
ll/sc so that code builds even for processors that don't support the
instructions. Plus minor formatting fixes.") back in 2005.
Fix the problem by wrapping these instructions along with the adjacent
SYNC instructions only, following the practice established with commit
cfd54de3b0e4 ("MIPS: Avoid move psuedo-instruction whilst using
MIPS_ISA_LEVEL") and commit 378ed6f0e3c5 ("MIPS: Avoid using .set mips0
to restore ISA"). Strictly speaking the SYNC instructions do not have
to be wrapped as they are only used as a Loongson3 erratum workaround,
so they will be enabled in the assembler by default, but do this so as
to keep code consistent with other places.
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Fixes: c7e2d71dda7a ("MIPS: Fix set_pte() for Netlogic XLR using cmpxchg64()")
Cc: stable(a)vger.kernel.org # v5.1+
---
Hi,
This is a version of commit a923a2676e60 for 5.4-stable and before (where
the SYNC instructions mentioned in the description have not been added yet
and hence the merge conflict). No functional change, just a mechanical
update. Verified to build. Please apply.
Maciej
---
arch/mips/include/asm/cmpxchg.h | 3 +++
1 file changed, 3 insertions(+)
Index: linux-5.4-test/arch/mips/include/asm/cmpxchg.h
===================================================================
--- linux-5.4-test.orig/arch/mips/include/asm/cmpxchg.h
+++ linux-5.4-test/arch/mips/include/asm/cmpxchg.h
@@ -239,6 +239,7 @@ static inline unsigned long __cmpxchg64(
" .set " MIPS_ISA_ARCH_LEVEL " \n"
/* Load 64 bits from ptr */
"1: lld %L0, %3 # __cmpxchg64 \n"
+ " .set pop \n"
/*
* Split the 64 bit value we loaded into the 2 registers that hold the
* ret variable.
@@ -266,6 +267,8 @@ static inline unsigned long __cmpxchg64(
" or %L1, %L1, $at \n"
" .set at \n"
# endif
+ " .set push \n"
+ " .set " MIPS_ISA_ARCH_LEVEL " \n"
/* Attempt to store new at ptr */
" scd %L1, %2 \n"
/* If we failed, loop! */
From: Orson Zhai <orson.zhai(a)unisoc.com>
Hi Greg,
Following 2 patches were merged into 5.10.y but not in 5.4.y.
We've found kernel crashes on our devices with 5.4 stable caused by missing them.
Please feel free to add them into the stable queue for 5.4.y if no issue.
Thanks,
Orson
Adrian Hunter (1):
scsi: ufs: Fix interrupt error message for shared interrupts
Jaegeuk Kim (1):
scsi: ufs: Fix tm request when non-fatal error happens
drivers/scsi/ufs/ufshcd.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
--
2.7.4
Hi Greg,
We see the following build warning/error in v5.4.160.
drivers/soc/tegra/pmc.c:612:1: error: unused label 'powergate_off'
The problem isn't that the label is left-over, the problem is
that upstream commit 19221e308302 ("soc/tegra: pmc: Fix imbalanced
clock disabling in error code path") is missing in v5.4.y. Please apply.
Thanks,
Guenter
Hi,
I see the following build failure in v4.9.y and v4.14.y stable queues.
arch/s390/mm/gmap.c: In function '__gmap_zap':
arch/s390/mm/gmap.c:665:9: error: implicit declaration of function 'vma_lookup'
In v4.14.y, there is an additional failure:
arch/s390/mm/pgtable.c: In function 'pgste_perform_essa':
arch/s390/mm/pgtable.c:910:8: error: implicit declaration of function 'vma_lookup'
Guenter
This is a note to let you know that I've just added the patch titled
usb: hub: Fix usb enumeration issue due to address0 race
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 6ae6dc22d2d1ce6aa77a6da8a761e61aca216f8b Mon Sep 17 00:00:00 2001
From: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Date: Tue, 16 Nov 2021 00:16:30 +0200
Subject: usb: hub: Fix usb enumeration issue due to address0 race
xHC hardware can only have one slot in default state with address 0
waiting for a unique address at a time, otherwise "undefined behavior
may occur" according to xhci spec 5.4.3.4
The address0_mutex exists to prevent this across both xhci roothubs.
If hub_port_init() fails, it may unlock the mutex and exit with a xhci
slot in default state. If the other xhci roothub calls hub_port_init()
at this point we end up with two slots in default state.
Make sure the address0_mutex protects the slot default state across
hub_port_init() retries, until slot is addressed or disabled.
Note, one known minor case is not fixed by this patch.
If device needs to be reset during resume, but fails all hub_port_init()
retries in usb_reset_and_verify_device(), then it's possible the slot is
still left in default state when address0_mutex is unlocked.
Cc: <stable(a)vger.kernel.org>
Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in parallel")
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Link: https://lore.kernel.org/r/20211115221630.871204-1-mathias.nyman@linux.intel…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/core/hub.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 86658a81d284..00c3506324e4 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -4700,8 +4700,6 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1,
if (oldspeed == USB_SPEED_LOW)
delay = HUB_LONG_RESET_TIME;
- mutex_lock(hcd->address0_mutex);
-
/* Reset the device; full speed may morph to high speed */
/* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */
retval = hub_port_reset(hub, port1, udev, delay, false);
@@ -5016,7 +5014,6 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1,
hub_port_disable(hub, port1, 0);
update_devnum(udev, devnum); /* for disconnect processing */
}
- mutex_unlock(hcd->address0_mutex);
return retval;
}
@@ -5246,6 +5243,9 @@ static void hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus,
unit_load = 100;
status = 0;
+
+ mutex_lock(hcd->address0_mutex);
+
for (i = 0; i < PORT_INIT_TRIES; i++) {
/* reallocate for each attempt, since references
@@ -5282,6 +5282,8 @@ static void hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus,
if (status < 0)
goto loop;
+ mutex_unlock(hcd->address0_mutex);
+
if (udev->quirks & USB_QUIRK_DELAY_INIT)
msleep(2000);
@@ -5370,6 +5372,7 @@ static void hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus,
loop_disable:
hub_port_disable(hub, port1, 1);
+ mutex_lock(hcd->address0_mutex);
loop:
usb_ep0_reinit(udev);
release_devnum(udev);
@@ -5396,6 +5399,8 @@ static void hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus,
}
done:
+ mutex_unlock(hcd->address0_mutex);
+
hub_port_disable(hub, port1, 1);
if (hcd->driver->relinquish_port && !hub->hdev->parent) {
if (status != -ENOTCONN && status != -ENODEV)
@@ -5915,6 +5920,8 @@ static int usb_reset_and_verify_device(struct usb_device *udev)
bos = udev->bos;
udev->bos = NULL;
+ mutex_lock(hcd->address0_mutex);
+
for (i = 0; i < PORT_INIT_TRIES; ++i) {
/* ep0 maxpacket size may change; let the HCD know about it.
@@ -5924,6 +5931,7 @@ static int usb_reset_and_verify_device(struct usb_device *udev)
if (ret >= 0 || ret == -ENOTCONN || ret == -ENODEV)
break;
}
+ mutex_unlock(hcd->address0_mutex);
if (ret < 0)
goto re_enumerate;
--
2.34.0
This is a note to let you know that I've just added the patch titled
usb: chipidea: ci_hdrc_imx: fix potential error pointer dereference
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From d4d2e5329ae9dfd6742c84d79f7d143d10410f1b Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)oracle.com>
Date: Wed, 17 Nov 2021 10:49:23 +0300
Subject: usb: chipidea: ci_hdrc_imx: fix potential error pointer dereference
in probe
If the first call to devm_usb_get_phy_by_phandle(dev, "fsl,usbphy", 0)
fails with something other than -ENODEV then it leads to an error
pointer dereference. For those errors we should just jump directly to
the error handling.
Fixes: 8253a34bfae3 ("usb: chipidea: ci_hdrc_imx: Also search for 'phys' phandle")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Link: https://lore.kernel.org/r/20211117074923.GF5237@kili
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/chipidea/ci_hdrc_imx.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index f1d100671ee6..097142ffb184 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -420,15 +420,15 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
data->phy = devm_usb_get_phy_by_phandle(dev, "fsl,usbphy", 0);
if (IS_ERR(data->phy)) {
ret = PTR_ERR(data->phy);
- if (ret == -ENODEV) {
- data->phy = devm_usb_get_phy_by_phandle(dev, "phys", 0);
- if (IS_ERR(data->phy)) {
- ret = PTR_ERR(data->phy);
- if (ret == -ENODEV)
- data->phy = NULL;
- else
- goto err_clk;
- }
+ if (ret != -ENODEV)
+ goto err_clk;
+ data->phy = devm_usb_get_phy_by_phandle(dev, "phys", 0);
+ if (IS_ERR(data->phy)) {
+ ret = PTR_ERR(data->phy);
+ if (ret == -ENODEV)
+ data->phy = NULL;
+ else
+ goto err_clk;
}
}
--
2.34.0
This is a note to let you know that I've just added the patch titled
usb: typec: fusb302: Fix masking of comparator and bc_lvl interrupts
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 362468830dd5bea8bf6ad5203b2ea61f8a4e8288 Mon Sep 17 00:00:00 2001
From: Ondrej Jirman <megous(a)megous.com>
Date: Mon, 8 Nov 2021 11:28:32 +0100
Subject: usb: typec: fusb302: Fix masking of comparator and bc_lvl interrupts
The code that enables either BC_LVL or COMP_CHNG interrupt in tcpm_set_cc
wrongly assumes that the interrupt is unmasked by writing 1 to the apropriate
bit in the mask register. In fact, interrupts are enabled when the mask
is 0, so the tcpm_set_cc enables interrupt for COMP_CHNG when it expects
BC_LVL interrupt to be enabled.
This causes inability of the driver to recognize cable unplug events
in host mode (unplug is recognized only via a COMP_CHNG interrupt).
In device mode this bug was masked by simultaneous triggering of the VBUS
change interrupt, because of loss of VBUS when the port peer is providing
power.
Fixes: 48242e30532b ("usb: typec: fusb302: Revert "Resolve fixed power role contract setup"")
Cc: stable <stable(a)vger.kernel.org>
Cc: Hans de Goede <hdegoede(a)redhat.com>
Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
Acked-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Ondrej Jirman <megous(a)megous.com>
Link: https://lore.kernel.org/r/20211108102833.2793803-1-megous@megous.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/typec/tcpm/fusb302.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/typec/tcpm/fusb302.c b/drivers/usb/typec/tcpm/fusb302.c
index 7a2a17866a82..72f9001b0792 100644
--- a/drivers/usb/typec/tcpm/fusb302.c
+++ b/drivers/usb/typec/tcpm/fusb302.c
@@ -669,25 +669,27 @@ static int tcpm_set_cc(struct tcpc_dev *dev, enum typec_cc_status cc)
ret = fusb302_i2c_mask_write(chip, FUSB_REG_MASK,
FUSB_REG_MASK_BC_LVL |
FUSB_REG_MASK_COMP_CHNG,
- FUSB_REG_MASK_COMP_CHNG);
+ FUSB_REG_MASK_BC_LVL);
if (ret < 0) {
fusb302_log(chip, "cannot set SRC interrupt, ret=%d",
ret);
goto done;
}
chip->intr_comp_chng = true;
+ chip->intr_bc_lvl = false;
break;
case TYPEC_CC_RD:
ret = fusb302_i2c_mask_write(chip, FUSB_REG_MASK,
FUSB_REG_MASK_BC_LVL |
FUSB_REG_MASK_COMP_CHNG,
- FUSB_REG_MASK_BC_LVL);
+ FUSB_REG_MASK_COMP_CHNG);
if (ret < 0) {
fusb302_log(chip, "cannot set SRC interrupt, ret=%d",
ret);
goto done;
}
chip->intr_bc_lvl = true;
+ chip->intr_comp_chng = false;
break;
default:
break;
--
2.34.0
This is a note to let you know that I've just added the patch titled
usb: dwc3: leave default DMA for PCI devices
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 47ce45906ca9870cf5267261f155fb7c70307cf0 Mon Sep 17 00:00:00 2001
From: Fabio Aiuto <fabioaiuto83(a)gmail.com>
Date: Sat, 13 Nov 2021 15:29:59 +0100
Subject: usb: dwc3: leave default DMA for PCI devices
in case of a PCI dwc3 controller, leave the default DMA
mask. Calling of a 64 bit DMA mask breaks the driver on
cherrytrail based tablets like Cyberbook T116.
Fixes: 45d39448b4d0 ("usb: dwc3: support 64 bit DMA in platform driver")
Cc: stable <stable(a)vger.kernel.org>
Reported-by: Hans De Goede <hdegoede(a)redhat.com>
Tested-by: Fabio Aiuto <fabioaiuto83(a)gmail.com>
Tested-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Fabio Aiuto <fabioaiuto83(a)gmail.com>
Link: https://lore.kernel.org/r/20211113142959.27191-1-fabioaiuto83@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/core.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 643239d7d370..f4c09951b517 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1594,9 +1594,11 @@ static int dwc3_probe(struct platform_device *pdev)
dwc3_get_properties(dwc);
- ret = dma_set_mask_and_coherent(dwc->sysdev, DMA_BIT_MASK(64));
- if (ret)
- return ret;
+ if (!dwc->sysdev_is_parent) {
+ ret = dma_set_mask_and_coherent(dwc->sysdev, DMA_BIT_MASK(64));
+ if (ret)
+ return ret;
+ }
dwc->reset = devm_reset_control_array_get_optional_shared(dev);
if (IS_ERR(dwc->reset))
--
2.34.0
This is a note to let you know that I've just added the patch titled
usb: dwc2: hcd_queue: Fix use of floating point literal
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 310780e825f3ffd211b479b8f828885a6faedd63 Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <nathan(a)kernel.org>
Date: Fri, 5 Nov 2021 07:58:03 -0700
Subject: usb: dwc2: hcd_queue: Fix use of floating point literal
A new commit in LLVM causes an error on the use of 'long double' when
'-mno-x87' is used, which the kernel does through an alias,
'-mno-80387' (see the LLVM commit below for more details around why it
does this).
drivers/usb/dwc2/hcd_queue.c:1744:25: error: expression requires 'long double' type support, but target 'x86_64-unknown-linux-gnu' does not support it
delay = ktime_set(0, DWC2_RETRY_WAIT_DELAY);
^
drivers/usb/dwc2/hcd_queue.c:62:34: note: expanded from macro 'DWC2_RETRY_WAIT_DELAY'
#define DWC2_RETRY_WAIT_DELAY (1 * 1E6L)
^
1 error generated.
This happens due to the use of a 'long double' literal. The 'E6' part of
'1E6L' causes the literal to be a 'double' then the 'L' suffix promotes
it to 'long double'.
There is no visible reason for a floating point value in this driver, as
the value is only used as a parameter to a function that expects an
integer type. Use NSEC_PER_MSEC, which is the same integer value as
'1E6L', to avoid changing functionality but fix the error.
Link: https://github.com/ClangBuiltLinux/linux/issues/1497
Link: https://github.com/llvm/llvm-project/commit/a8083d42b1c346e21623a1d36d1f0ca…
Fixes: 6ed30a7d8ec2 ("usb: dwc2: host: use hrtimer for NAK retries")
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers(a)google.com>
Reviewed-by: John Keeping <john(a)metanate.com>
Acked-by: Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Link: https://lore.kernel.org/r/20211105145802.2520658-1-nathan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc2/hcd_queue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
index 89a788326c56..24beff610cf2 100644
--- a/drivers/usb/dwc2/hcd_queue.c
+++ b/drivers/usb/dwc2/hcd_queue.c
@@ -59,7 +59,7 @@
#define DWC2_UNRESERVE_DELAY (msecs_to_jiffies(5))
/* If we get a NAK, wait this long before retrying */
-#define DWC2_RETRY_WAIT_DELAY (1 * 1E6L)
+#define DWC2_RETRY_WAIT_DELAY (1 * NSEC_PER_MSEC)
/**
* dwc2_periodic_channel_available() - Checks that a channel is available for a
--
2.34.0