This is a good example that emphasizes that the order in which patches
are queued to stable matters. More details in the revert commit.
Tested and intended for 4.14, 4.19, 5.4, 5.10.
Baokun Li (1):
ext4: fix use-after-free in ext4_xattr_set_entry
Ritesh Harjani (1):
ext4: remove duplicate definition of ext4_xattr_ibody_inline_set()
Tudor Ambarus (1):
Revert "ext4: fix use-after-free in ext4_xattr_set_entry"
fs/ext4/inline.c | 11 +++++------
fs/ext4/xattr.c | 26 +-------------------------
fs/ext4/xattr.h | 6 +++---
3 files changed, 9 insertions(+), 34 deletions(-)
--
2.40.0.634.g4ca3ef3211-goog
From: Pingfan Liu <kernelfans(a)gmail.com>
Purgatory.ro is a standalone binary that is not linked against the rest of
the kernel. Its image is copied into an array that is linked to the
kernel, and from there kexec relocates it wherever it desires.
Unlike the debug info for vmlinux, which can be used for analyzing crash
such info is useless in purgatory.ro. And discarding them can save about
200K space.
Original:
259080 kexec-purgatory.o
Stripped debug info:
29152 kexec-purgatory.o
Signed-off-by: Pingfan Liu <kernelfans(a)gmail.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers(a)google.com>
Reviewed-by: Steve Wahl <steve.wahl(a)hpe.com>
Acked-by: Dave Young <dyoung(a)redhat.com>
Link: https://lore.kernel.org/r/1596433788-3784-1-git-send-email-kernelfans@gmail…
(cherry picked from commit 52416ffcf823ee11aa19792715664ab94757f111)
Signed-off-by: Alyssa Ross <hi(a)alyssa.is>
---
arch/x86/purgatory/Makefile | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 2f15a2ac4209..2040ddb824c2 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -20,6 +20,9 @@ KBUILD_CFLAGS := -fno-strict-aliasing -Wall -Wstrict-prototypes -fno-zero-initia
KBUILD_CFLAGS += -m$(BITS)
KBUILD_CFLAGS += $(call cc-option,-fno-PIE)
+AFLAGS_REMOVE_setup-x86_$(BITS).o += -Wa,-gdwarf-2
+AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2
+
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
--
2.37.1
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x 16c52e503043aed1e2a2ce38d9249de5936c1f6b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042213-overbid-jitters-7a29@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
16c52e503043 ("LoongArch: Make WriteCombine configurable for ioremap()")
41596803302d ("LoongArch: Make -mstrict-align configurable")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 16c52e503043aed1e2a2ce38d9249de5936c1f6b Mon Sep 17 00:00:00 2001
From: Huacai Chen <chenhuacai(a)kernel.org>
Date: Tue, 18 Apr 2023 19:38:58 +0800
Subject: [PATCH] LoongArch: Make WriteCombine configurable for ioremap()
LoongArch maintains cache coherency in hardware, but when paired with
LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
to WriteCombine) is out of the scope of cache coherency machanism for
PCIe devices (this is a PCIe protocol violation, which may be fixed in
newer chipsets).
This means WUC can only used for write-only memory regions now, so this
option is disabled by default, making WUC silently fallback to SUC for
ioremap(). You can enable this option if the kernel is ensured to run on
hardware without this bug.
Kernel parameter writecombine=on/off can be used to override the Kconfig
option.
Cc: stable(a)vger.kernel.org
Suggested-by: WANG Xuerui <kernel(a)xen0n.name>
Reviewed-by: WANG Xuerui <kernel(a)xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index 19600c50277b..6ae5f129fbca 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -128,6 +128,7 @@ parameter is applicable::
KVM Kernel Virtual Machine support is enabled.
LIBATA Libata driver is enabled
LP Printer support is enabled.
+ LOONGARCH LoongArch architecture is enabled.
LOOP Loopback device support is enabled.
M68k M68k architecture is enabled.
These options have more detailed description inside of
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6221a1d057dd..7016cb12dc4e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6933,6 +6933,12 @@
When enabled, memory and cache locality will be
impacted.
+ writecombine= [LOONGARCH] Control the MAT (Memory Access Type) of
+ ioremap_wc().
+
+ on - Enable writecombine, use WUC for ioremap_wc()
+ off - Disable writecombine, use SUC for ioremap_wc()
+
x2apic_phys [X86-64,APIC] Use x2apic physical mode instead of
default x2apic cluster mode on platforms
supporting x2apic.
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 7fd51257e0ed..3ddde336e6a5 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -447,6 +447,22 @@ config ARCH_IOREMAP
protection support. However, you can enable LoongArch DMW-based
ioremap() for better performance.
+config ARCH_WRITECOMBINE
+ bool "Enable WriteCombine (WUC) for ioremap()"
+ help
+ LoongArch maintains cache coherency in hardware, but when paired
+ with LS7A chipsets the WUC attribute (Weak-ordered UnCached, which
+ is similar to WriteCombine) is out of the scope of cache coherency
+ machanism for PCIe devices (this is a PCIe protocol violation, which
+ may be fixed in newer chipsets).
+
+ This means WUC can only used for write-only memory regions now, so
+ this option is disabled by default, making WUC silently fallback to
+ SUC for ioremap(). You can enable this option if the kernel is ensured
+ to run on hardware without this bug.
+
+ You can override this setting via writecombine=on/off boot parameter.
+
config ARCH_STRICT_ALIGN
bool "Enable -mstrict-align to prevent unaligned accesses" if EXPERT
default y
diff --git a/arch/loongarch/include/asm/io.h b/arch/loongarch/include/asm/io.h
index 402a7d9e3a53..545e2708fbf7 100644
--- a/arch/loongarch/include/asm/io.h
+++ b/arch/loongarch/include/asm/io.h
@@ -54,8 +54,10 @@ static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size,
* @offset: bus address of the memory
* @size: size of the resource to map
*/
+extern pgprot_t pgprot_wc;
+
#define ioremap_wc(offset, size) \
- ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_WUC))
+ ioremap_prot((offset), (size), pgprot_val(pgprot_wc))
#define ioremap_cache(offset, size) \
ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL))
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index bae84ccf6d36..27f71f9531e1 100644
--- a/arch/loongarch/kernel/setup.c
+++ b/arch/loongarch/kernel/setup.c
@@ -160,6 +160,27 @@ static void __init smbios_parse(void)
dmi_walk(find_tokens, NULL);
}
+#ifdef CONFIG_ARCH_WRITECOMBINE
+pgprot_t pgprot_wc = PAGE_KERNEL_WUC;
+#else
+pgprot_t pgprot_wc = PAGE_KERNEL_SUC;
+#endif
+
+EXPORT_SYMBOL(pgprot_wc);
+
+static int __init setup_writecombine(char *p)
+{
+ if (!strcmp(p, "on"))
+ pgprot_wc = PAGE_KERNEL_WUC;
+ else if (!strcmp(p, "off"))
+ pgprot_wc = PAGE_KERNEL_SUC;
+ else
+ pr_warn("Unknown writecombine setting \"%s\".\n", p);
+
+ return 0;
+}
+early_param("writecombine", setup_writecombine);
+
static int usermem __initdata;
static int __init early_parse_mem(char *p)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x a25bc8486f9c01c1af6b6c5657234b2eee2c39d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042351-feline-gas-e7b7@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
a25bc8486f9c ("KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()")
85fbe08e4da8 ("KVM: arm64: Factor out firmware register handling from psci.c")
1ebdbeb03efe ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a25bc8486f9c01c1af6b6c5657234b2eee2c39d6 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)linaro.org>
Date: Wed, 19 Apr 2023 13:16:13 +0300
Subject: [PATCH] KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()
The KVM_REG_SIZE() comes from the ioctl and it can be a power of two
between 0-32768 but if it is more than sizeof(long) this will corrupt
memory.
Fixes: 99adb567632b ("KVM: arm/arm64: Add save/restore support for firmware workaround state")
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Reviewed-by: Steven Price <steven.price(a)arm.com>
Reviewed-by: Eric Auger <eric.auger(a)redhat.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/4efbab8c-640f-43b2-8ac6-6d68e08280fe@kili.mountain
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 5da884e11337..c4b4678bc4a4 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -397,6 +397,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
u64 val;
int wa_level;
+ if (KVM_REG_SIZE(reg->id) != sizeof(val))
+ return -ENOENT;
if (copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id)))
return -EFAULT;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x a25bc8486f9c01c1af6b6c5657234b2eee2c39d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042350-coveted-collapse-44f5@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
a25bc8486f9c ("KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()")
85fbe08e4da8 ("KVM: arm64: Factor out firmware register handling from psci.c")
1ebdbeb03efe ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a25bc8486f9c01c1af6b6c5657234b2eee2c39d6 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)linaro.org>
Date: Wed, 19 Apr 2023 13:16:13 +0300
Subject: [PATCH] KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()
The KVM_REG_SIZE() comes from the ioctl and it can be a power of two
between 0-32768 but if it is more than sizeof(long) this will corrupt
memory.
Fixes: 99adb567632b ("KVM: arm/arm64: Add save/restore support for firmware workaround state")
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Reviewed-by: Steven Price <steven.price(a)arm.com>
Reviewed-by: Eric Auger <eric.auger(a)redhat.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/4efbab8c-640f-43b2-8ac6-6d68e08280fe@kili.mountain
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 5da884e11337..c4b4678bc4a4 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -397,6 +397,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
u64 val;
int wa_level;
+ if (KVM_REG_SIZE(reg->id) != sizeof(val))
+ return -ENOENT;
if (copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id)))
return -EFAULT;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a25bc8486f9c01c1af6b6c5657234b2eee2c39d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042349-cackle-parasite-ff0b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
a25bc8486f9c ("KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()")
85fbe08e4da8 ("KVM: arm64: Factor out firmware register handling from psci.c")
1ebdbeb03efe ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a25bc8486f9c01c1af6b6c5657234b2eee2c39d6 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)linaro.org>
Date: Wed, 19 Apr 2023 13:16:13 +0300
Subject: [PATCH] KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()
The KVM_REG_SIZE() comes from the ioctl and it can be a power of two
between 0-32768 but if it is more than sizeof(long) this will corrupt
memory.
Fixes: 99adb567632b ("KVM: arm/arm64: Add save/restore support for firmware workaround state")
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Reviewed-by: Steven Price <steven.price(a)arm.com>
Reviewed-by: Eric Auger <eric.auger(a)redhat.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/4efbab8c-640f-43b2-8ac6-6d68e08280fe@kili.mountain
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 5da884e11337..c4b4678bc4a4 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -397,6 +397,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
u64 val;
int wa_level;
+ if (KVM_REG_SIZE(reg->id) != sizeof(val))
+ return -ENOENT;
if (copy_from_user(&val, uaddr, KVM_REG_SIZE(reg->id)))
return -EFAULT;
The data->block[0] variable comes from user and is a number between
0-255. Without proper check, the variable may be very large to cause
an out-of-bounds when performing memcpy in slimpro_i2c_blkwr.
Fix this bug by checking the value of writelen.
Fixes: f6505fbabc42 ("i2c: add SLIMpro I2C device driver on APM X-Gene platform")
Signed-off-by: Wei Chen <harperchen1110(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
Changes in v2:
- Put length check inside slimpro_i2c_blkwr
Changes in v3:
- Correct the format of patch
Changes in v4:
- CC stable email address
drivers/i2c/busses/i2c-xgene-slimpro.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/i2c/busses/i2c-xgene-slimpro.c b/drivers/i2c/busses/i2c-xgene-slimpro.c
index bc9a3e7e0c96..0f7263e2276a 100644
--- a/drivers/i2c/busses/i2c-xgene-slimpro.c
+++ b/drivers/i2c/busses/i2c-xgene-slimpro.c
@@ -308,6 +308,9 @@ static int slimpro_i2c_blkwr(struct slimpro_i2c_dev *ctx, u32 chip,
u32 msg[3];
int rc;
+ if (writelen > I2C_SMBUS_BLOCK_MAX)
+ return -EINVAL;
+
memcpy(ctx->dma_buffer, data, writelen);
paddr = dma_map_single(ctx->dev, ctx->dma_buffer, writelen,
DMA_TO_DEVICE);
--
2.25.1
OverCurrent condition is not standardized in the UHCI spec.
Zhaoxin UHCI controllers report OverCurrent bit active off.
In order to handle OverCurrent condition correctly, the uhci-hcd
driver needs to be told to expect the active-off behavior.
Suggested-by: Alan Stern <stern(a)rowland.harvard.edu>
Cc: stable(a)vger.kernel.org
Signed-off-by: Weitao Wang <WeitaoWang-oc(a)zhaoxin.com>
---
v1->v2
- Modify the description of this patch.
- Let Zhaoxin and VIA share a common oc_low flag
drivers/usb/host/uhci-pci.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/host/uhci-pci.c b/drivers/usb/host/uhci-pci.c
index 3592f757fe05..034586911bb5 100644
--- a/drivers/usb/host/uhci-pci.c
+++ b/drivers/usb/host/uhci-pci.c
@@ -119,11 +119,12 @@ static int uhci_pci_init(struct usb_hcd *hcd)
uhci->rh_numports = uhci_count_ports(hcd);
- /* Intel controllers report the OverCurrent bit active on.
- * VIA controllers report it active off, so we'll adjust the
- * bit value. (It's not standardized in the UHCI spec.)
+ /* Intel controllers report the OverCurrent bit active on. VIA
+ * and ZHAOXIN controllers report it active off, so we'll adjust
+ * the bit value. (It's not standardized in the UHCI spec.)
*/
- if (to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_VIA)
+ if (to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_VIA ||
+ to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_ZHAOXIN)
uhci->oc_low = 1;
/* HP's server management chip requires a longer port reset delay. */
--
2.32.0
This is just a resend of
https://lore.kernel.org/stable/20230318173943.3188213-1-qyousef@layalina.io/
Changes in v2:
* Fix compilation error against patch 7 due to misiplace #endif to
protect against CONFIG_SMP which doesn't contain the newly added
field to struct rq.
Commit 2ff401441711 ("sched/uclamp: Fix relationship between uclamp and
migration margin") was cherry-picked into 5.10 kernels but missed the rest of
the series.
This ports the remainder of the fixes.
Based on 5.10.172.
NOTE:
a2e90611b9f4 ("sched/fair: Remove capacity inversion detection") is not
necessary to backport because it has a dependency on e5ed0550c04c ("sched/fair:
unlink misfit task from cpu overutilized") which is nice to have but not
strictly required. It improves the search for best CPU under adverse thermal
pressure to try harder. And the new search effectively replaces the capacity
inversion detection, so it is removed afterwards.
Build tested on (cross compile when necessary; x86_64 otherwise):
1. default ubuntu config which has uclamp + smp
2. default ubuntu config without uclamp + smp
3. default ubunto config without smp (which automatically disables
uclamp)
4. reported riscv-allnoconfig, mips-randconfig, x86_64-randocnfigs
Tested on 5.10 Android GKI kernel and android device (with slight modifications
due to other conflicts on there).
Qais Yousef (10):
sched/uclamp: Make task_fits_capacity() use util_fits_cpu()
sched/uclamp: Fix fits_capacity() check in feec()
sched/uclamp: Make select_idle_capacity() use util_fits_cpu()
sched/uclamp: Make asym_fits_capacity() use util_fits_cpu()
sched/uclamp: Make cpu_overutilized() use util_fits_cpu()
sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early
exit condition
sched/fair: Detect capacity inversion
sched/fair: Consider capacity inversion in util_fits_cpu()
sched/uclamp: Fix a uninitialized variable warnings
sched/fair: Fixes for capacity inversion detection
kernel/sched/core.c | 10 +--
kernel/sched/fair.c | 183 ++++++++++++++++++++++++++++++++++---------
kernel/sched/sched.h | 70 ++++++++++++++++-
3 files changed, 217 insertions(+), 46 deletions(-)
--
2.25.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x f4e9e0e69468583c2c6d9d5c7bfc975e292bf188
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042254-platinum-service-2032@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
f4e9e0e69468 ("mm/mempolicy: fix use-after-free of VMA iterator")
9760ebffbf55 ("mm: switch vma_merge(), split_vma(), and __split_vma to vma iterator")
47d9644de92c ("nommu: convert nommu to using the vma iterator")
a27a11f92fe2 ("mm/mremap: use vmi version of vma_merge()")
076f16bf7698 ("mmap: use vmi version of vma_merge()")
0c0c5bffd0a2 ("mmap: pass through vmi iterator to __split_vma()")
178e22ac2078 ("madvise: use vmi iterator for __split_vma() and vma_merge()")
f10c2abcdac4 ("mempolicy: convert to vma iterator")
37598f5a9d8b ("mlock: convert mlock to vma iterator")
2286a6914c77 ("mm: change mprotect_fixup to vma iterator")
11a9b90274f6 ("userfaultfd: use vma iterator")
f2ebfe43ba6c ("mm: add temporary vma iterator versions of vma_merge(), split_vma(), and __split_vma()")
183654ce26a5 ("mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator")
0378c0a0e9e4 ("mm/mmap: remove preallocation from do_mas_align_munmap()")
92fed82047d7 ("mm/mmap: convert brk to use vma iterator")
baabcfc93d3b ("mm/mmap: fix typo in comment")
c5d5546ea065 ("maple_tree: remove the parameter entry of mas_preallocate")
5ab0fc155dc0 ("Sync mm-stable with mm-hotfixes-stable to pick up dependent patches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f4e9e0e69468583c2c6d9d5c7bfc975e292bf188 Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Mon, 10 Apr 2023 11:22:05 -0400
Subject: [PATCH] mm/mempolicy: fix use-after-free of VMA iterator
set_mempolicy_home_node() iterates over a list of VMAs and calls
mbind_range() on each VMA, which also iterates over the singular list of
the VMA passed in and potentially splits the VMA. Since the VMA iterator
is not passed through, set_mempolicy_home_node() may now point to a stale
node in the VMA tree. This can result in a UAF as reported by syzbot.
Avoid the stale maple tree node by passing the VMA iterator through to the
underlying call to split_vma().
mbind_range() is also overly complicated, since there are two calling
functions and one already handles iterating over the VMAs. Simplify
mbind_range() to only handle merging and splitting of the VMAs.
Align the new loop in do_mbind() and existing loop in
set_mempolicy_home_node() to use the reduced mbind_range() function. This
allows for a single location of the range calculation and avoids
constantly looking up the previous VMA (since this is a loop over the
VMAs).
Link: https://lore.kernel.org/linux-mm/000000000000c93feb05f87e24ad@google.com/
Fixes: 66850be55e8e ("mm/mempolicy: use vma iterator & maple state instead of vma linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: syzbot+a7c1ec5b1d71ceaa5186(a)syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/20230410152205.2294819-1-Liam.Howlett@oracle.com
Tested-by: syzbot+a7c1ec5b1d71ceaa5186(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a256a241fd1d..2068b594dc88 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -790,61 +790,50 @@ static int vma_replace_policy(struct vm_area_struct *vma,
return err;
}
-/* Step 2: apply policy to a range and do splits. */
-static int mbind_range(struct mm_struct *mm, unsigned long start,
- unsigned long end, struct mempolicy *new_pol)
+/* Split or merge the VMA (if required) and apply the new policy */
+static int mbind_range(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ struct vm_area_struct **prev, unsigned long start,
+ unsigned long end, struct mempolicy *new_pol)
{
- VMA_ITERATOR(vmi, mm, start);
- struct vm_area_struct *prev;
- struct vm_area_struct *vma;
- int err = 0;
+ struct vm_area_struct *merged;
+ unsigned long vmstart, vmend;
pgoff_t pgoff;
+ int err;
- prev = vma_prev(&vmi);
- vma = vma_find(&vmi, end);
- if (WARN_ON(!vma))
+ vmend = min(end, vma->vm_end);
+ if (start > vma->vm_start) {
+ *prev = vma;
+ vmstart = start;
+ } else {
+ vmstart = vma->vm_start;
+ }
+
+ if (mpol_equal(vma_policy(vma), new_pol))
return 0;
- if (start > vma->vm_start)
- prev = vma;
-
- do {
- unsigned long vmstart = max(start, vma->vm_start);
- unsigned long vmend = min(end, vma->vm_end);
-
- if (mpol_equal(vma_policy(vma), new_pol))
- goto next;
-
- pgoff = vma->vm_pgoff +
- ((vmstart - vma->vm_start) >> PAGE_SHIFT);
- prev = vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
- vma->anon_vma, vma->vm_file, pgoff,
- new_pol, vma->vm_userfaultfd_ctx,
- anon_vma_name(vma));
- if (prev) {
- vma = prev;
- goto replace;
- }
- if (vma->vm_start != vmstart) {
- err = split_vma(&vmi, vma, vmstart, 1);
- if (err)
- goto out;
- }
- if (vma->vm_end != vmend) {
- err = split_vma(&vmi, vma, vmend, 0);
- if (err)
- goto out;
- }
-replace:
- err = vma_replace_policy(vma, new_pol);
+ pgoff = vma->vm_pgoff + ((vmstart - vma->vm_start) >> PAGE_SHIFT);
+ merged = vma_merge(vmi, vma->vm_mm, *prev, vmstart, vmend, vma->vm_flags,
+ vma->anon_vma, vma->vm_file, pgoff, new_pol,
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ if (merged) {
+ *prev = merged;
+ return vma_replace_policy(merged, new_pol);
+ }
+
+ if (vma->vm_start != vmstart) {
+ err = split_vma(vmi, vma, vmstart, 1);
if (err)
- goto out;
-next:
- prev = vma;
- } for_each_vma_range(vmi, vma, end);
+ return err;
+ }
-out:
- return err;
+ if (vma->vm_end != vmend) {
+ err = split_vma(vmi, vma, vmend, 0);
+ if (err)
+ return err;
+ }
+
+ *prev = vma;
+ return vma_replace_policy(vma, new_pol);
}
/* Set the process memory policy */
@@ -1259,6 +1248,8 @@ static long do_mbind(unsigned long start, unsigned long len,
nodemask_t *nmask, unsigned long flags)
{
struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma, *prev;
+ struct vma_iterator vmi;
struct mempolicy *new;
unsigned long end;
int err;
@@ -1328,7 +1319,13 @@ static long do_mbind(unsigned long start, unsigned long len,
goto up_out;
}
- err = mbind_range(mm, start, end, new);
+ vma_iter_init(&vmi, mm, start);
+ prev = vma_prev(&vmi);
+ for_each_vma_range(vmi, vma, end) {
+ err = mbind_range(&vmi, vma, &prev, start, end, new);
+ if (err)
+ break;
+ }
if (!err) {
int nr_failed = 0;
@@ -1489,10 +1486,8 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
unsigned long, home_node, unsigned long, flags)
{
struct mm_struct *mm = current->mm;
- struct vm_area_struct *vma;
+ struct vm_area_struct *vma, *prev;
struct mempolicy *new, *old;
- unsigned long vmstart;
- unsigned long vmend;
unsigned long end;
int err = -ENOENT;
VMA_ITERATOR(vmi, mm, start);
@@ -1521,6 +1516,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
if (end == start)
return 0;
mmap_write_lock(mm);
+ prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
/*
* If any vma in the range got policy other than MPOL_BIND
@@ -1541,9 +1537,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
}
new->home_node = home_node;
- vmstart = max(start, vma->vm_start);
- vmend = min(end, vma->vm_end);
- err = mbind_range(mm, vmstart, vmend, new);
+ err = mbind_range(&vmi, vma, &prev, start, end, new);
mpol_put(new);
if (err)
break;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 24bf08c4376be417f16ceb609188b16f461b0443
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042218-hexagon-cheese-b961@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
24bf08c4376b ("mm/userfaultfd: fix uffd-wp handling for THP migration entries")
6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
6c54dc6c7437 ("mm/rmap: use page_move_anon_rmap() when reusing a mapped PageAnon() page exclusively")
28c5209dfd5f ("mm/rmap: pass rmap flags to hugepage_add_anon_rmap()")
f1e2db12e45b ("mm/rmap: remove do_page_add_anon_rmap()")
14f9135d5470 ("mm/rmap: convert RMAP flags to a proper distinct rmap_t type")
fb3d824d1a46 ("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap()")
b51ad4f8679e ("mm/memory: slightly simplify copy_present_pte()")
623a1ddfeb23 ("mm/hugetlb: take src_mm->write_protect_seq in copy_hugetlb_page_range()")
3bff7e3f1f16 ("mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()")
c145e0b47c77 ("mm: streamline COW logic in do_swap_page()")
84d60fdd3733 ("mm: slightly clarify KSM logic in do_swap_page()")
53a05ad9f21d ("mm: optimize do_wp_page() for exclusive pages in the swapcache")
9030fb0bb9d6 ("Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24bf08c4376be417f16ceb609188b16f461b0443 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Wed, 5 Apr 2023 18:02:35 +0200
Subject: [PATCH] mm/userfaultfd: fix uffd-wp handling for THP migration
entries
Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
fix uffd-wp handling for migration entries in
hugetlb_change_protection()") similarly applies to THP.
Setting/clearing uffd-wp on THP migration entries is not implemented
properly. Further, while removing migration PMDs considers the uffd-wp
bit, inserting migration PMDs does not consider the uffd-wp bit.
We have to set/clear independently of the migration entry type in
change_huge_pmd() and properly copy the uffd-wp bit in
set_pmd_migration_entry().
Verified using a simple reproducer that triggers migration of a THP, that
the set_pmd_migration_entry() no longer loses the uffd-wp bit.
Link: https://lkml.kernel.org/r/20230405160236.587705-2-david@redhat.com
Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 032fb0ef9cd1..e3706a2b34b2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
if (is_swap_pmd(*pmd)) {
swp_entry_t entry = pmd_to_swp_entry(*pmd);
struct page *page = pfn_swap_entry_to_page(entry);
+ pmd_t newpmd;
VM_BUG_ON(!is_pmd_migration_entry(*pmd));
if (is_writable_migration_entry(entry)) {
- pmd_t newpmd;
/*
* A protection check is difficult so
* just be safe and disable write
@@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
newpmd = pmd_swp_mksoft_dirty(newpmd);
if (pmd_swp_uffd_wp(*pmd))
newpmd = pmd_swp_mkuffd_wp(newpmd);
- set_pmd_at(mm, addr, pmd, newpmd);
+ } else {
+ newpmd = *pmd;
}
+
+ if (uffd_wp)
+ newpmd = pmd_swp_mkuffd_wp(newpmd);
+ else if (uffd_wp_resolve)
+ newpmd = pmd_swp_clear_uffd_wp(newpmd);
+ if (!pmd_same(*pmd, newpmd))
+ set_pmd_at(mm, addr, pmd, newpmd);
goto unlock;
}
#endif
@@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
pmdswp = swp_entry_to_pmd(entry);
if (pmd_soft_dirty(pmdval))
pmdswp = pmd_swp_mksoft_dirty(pmdswp);
+ if (pmd_uffd_wp(pmdval))
+ pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
page_remove_rmap(page, vma, true);
put_page(page);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 24bf08c4376be417f16ceb609188b16f461b0443
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042216-retainer-immersion-5428@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
24bf08c4376b ("mm/userfaultfd: fix uffd-wp handling for THP migration entries")
6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
6c54dc6c7437 ("mm/rmap: use page_move_anon_rmap() when reusing a mapped PageAnon() page exclusively")
28c5209dfd5f ("mm/rmap: pass rmap flags to hugepage_add_anon_rmap()")
f1e2db12e45b ("mm/rmap: remove do_page_add_anon_rmap()")
14f9135d5470 ("mm/rmap: convert RMAP flags to a proper distinct rmap_t type")
fb3d824d1a46 ("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap()")
b51ad4f8679e ("mm/memory: slightly simplify copy_present_pte()")
623a1ddfeb23 ("mm/hugetlb: take src_mm->write_protect_seq in copy_hugetlb_page_range()")
3bff7e3f1f16 ("mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()")
c145e0b47c77 ("mm: streamline COW logic in do_swap_page()")
84d60fdd3733 ("mm: slightly clarify KSM logic in do_swap_page()")
53a05ad9f21d ("mm: optimize do_wp_page() for exclusive pages in the swapcache")
9030fb0bb9d6 ("Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24bf08c4376be417f16ceb609188b16f461b0443 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Wed, 5 Apr 2023 18:02:35 +0200
Subject: [PATCH] mm/userfaultfd: fix uffd-wp handling for THP migration
entries
Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
fix uffd-wp handling for migration entries in
hugetlb_change_protection()") similarly applies to THP.
Setting/clearing uffd-wp on THP migration entries is not implemented
properly. Further, while removing migration PMDs considers the uffd-wp
bit, inserting migration PMDs does not consider the uffd-wp bit.
We have to set/clear independently of the migration entry type in
change_huge_pmd() and properly copy the uffd-wp bit in
set_pmd_migration_entry().
Verified using a simple reproducer that triggers migration of a THP, that
the set_pmd_migration_entry() no longer loses the uffd-wp bit.
Link: https://lkml.kernel.org/r/20230405160236.587705-2-david@redhat.com
Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 032fb0ef9cd1..e3706a2b34b2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
if (is_swap_pmd(*pmd)) {
swp_entry_t entry = pmd_to_swp_entry(*pmd);
struct page *page = pfn_swap_entry_to_page(entry);
+ pmd_t newpmd;
VM_BUG_ON(!is_pmd_migration_entry(*pmd));
if (is_writable_migration_entry(entry)) {
- pmd_t newpmd;
/*
* A protection check is difficult so
* just be safe and disable write
@@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
newpmd = pmd_swp_mksoft_dirty(newpmd);
if (pmd_swp_uffd_wp(*pmd))
newpmd = pmd_swp_mkuffd_wp(newpmd);
- set_pmd_at(mm, addr, pmd, newpmd);
+ } else {
+ newpmd = *pmd;
}
+
+ if (uffd_wp)
+ newpmd = pmd_swp_mkuffd_wp(newpmd);
+ else if (uffd_wp_resolve)
+ newpmd = pmd_swp_clear_uffd_wp(newpmd);
+ if (!pmd_same(*pmd, newpmd))
+ set_pmd_at(mm, addr, pmd, newpmd);
goto unlock;
}
#endif
@@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
pmdswp = swp_entry_to_pmd(entry);
if (pmd_soft_dirty(pmdval))
pmdswp = pmd_swp_mksoft_dirty(pmdswp);
+ if (pmd_uffd_wp(pmdval))
+ pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
page_remove_rmap(page, vma, true);
put_page(page);
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x 0b5dfe12755f87ec014bb4cc1930485026167430
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042210-apache-rally-cc5e@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
0b5dfe12755f ("drm/amd/display: fix a divided-by-zero error")
13b90cf900ab ("drm/amd/display: fix PSR-SU/DSC interoperability support")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0b5dfe12755f87ec014bb4cc1930485026167430 Mon Sep 17 00:00:00 2001
From: Alex Hung <alex.hung(a)amd.com>
Date: Mon, 3 Apr 2023 17:45:41 +0800
Subject: [PATCH] drm/amd/display: fix a divided-by-zero error
[Why & How]
timing.dsc_cfg.num_slices_v can be zero and it is necessary to check
before using it.
This fixes the error "divide error: 0000 [#1] PREEMPT SMP NOPTI".
Reviewed-by: Aurabindo Pillai <Aurabindo.Pillai(a)amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo(a)amd.com>
Signed-off-by: Alex Hung <alex.hung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/modules/power/power_helpers.c b/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
index e39b133d05af..b56f07f99d09 100644
--- a/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
+++ b/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
@@ -934,6 +934,10 @@ bool psr_su_set_dsc_slice_height(struct dc *dc, struct dc_link *link,
pic_height = stream->timing.v_addressable +
stream->timing.v_border_top + stream->timing.v_border_bottom;
+
+ if (stream->timing.dsc_cfg.num_slices_v == 0)
+ return false;
+
slice_height = pic_height / stream->timing.dsc_cfg.num_slices_v;
config->dsc_slice_height = slice_height;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 0b5dfe12755f87ec014bb4cc1930485026167430
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042209-scuff-accustom-83bb@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
0b5dfe12755f ("drm/amd/display: fix a divided-by-zero error")
13b90cf900ab ("drm/amd/display: fix PSR-SU/DSC interoperability support")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0b5dfe12755f87ec014bb4cc1930485026167430 Mon Sep 17 00:00:00 2001
From: Alex Hung <alex.hung(a)amd.com>
Date: Mon, 3 Apr 2023 17:45:41 +0800
Subject: [PATCH] drm/amd/display: fix a divided-by-zero error
[Why & How]
timing.dsc_cfg.num_slices_v can be zero and it is necessary to check
before using it.
This fixes the error "divide error: 0000 [#1] PREEMPT SMP NOPTI".
Reviewed-by: Aurabindo Pillai <Aurabindo.Pillai(a)amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo(a)amd.com>
Signed-off-by: Alex Hung <alex.hung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/modules/power/power_helpers.c b/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
index e39b133d05af..b56f07f99d09 100644
--- a/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
+++ b/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
@@ -934,6 +934,10 @@ bool psr_su_set_dsc_slice_height(struct dc *dc, struct dc_link *link,
pic_height = stream->timing.v_addressable +
stream->timing.v_border_top + stream->timing.v_border_bottom;
+
+ if (stream->timing.dsc_cfg.num_slices_v == 0)
+ return false;
+
slice_height = pic_height / stream->timing.dsc_cfg.num_slices_v;
config->dsc_slice_height = slice_height;
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x 1e994cc0956b8dabd1b1fef315bbd722733b8aa8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042204-threefold-stingy-5bc3@gregkh' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
1e994cc0956b ("drm/amd/display: limit timing for single dimm memory")
71c4ca2d3b07 ("drm/amd/display: Remove stutter only configurations")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e994cc0956b8dabd1b1fef315bbd722733b8aa8 Mon Sep 17 00:00:00 2001
From: Daniel Miess <Daniel.Miess(a)amd.com>
Date: Tue, 4 Apr 2023 14:04:11 -0400
Subject: [PATCH] drm/amd/display: limit timing for single dimm memory
[Why]
1. It could hit bandwidth limitdation under single dimm
memory when connecting 8K external monitor.
2. IsSupportedVidPn got validation failed with
2K240Hz eDP + 8K24Hz external monitor.
3. It's better to filter out such combination in
EnumVidPnCofuncModality
4. For short term, filter out in dc bandwidth validation.
[How]
Force 2K@240Hz+8K@24Hz timing validation false in dc.
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas(a)amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo(a)amd.com>
Signed-off-by: Daniel Miess <Daniel.Miess(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
index 54ed3de869d3..9ffba4c6fe55 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
@@ -1697,6 +1697,23 @@ static void dcn314_get_panel_config_defaults(struct dc_panel_config *panel_confi
*panel_config = panel_config_defaults;
}
+static bool filter_modes_for_single_channel_workaround(struct dc *dc,
+ struct dc_state *context)
+{
+ // Filter 2K@240Hz+8K@24fps above combination timing if memory only has single dimm LPDDR
+ if (dc->clk_mgr->bw_params->vram_type == 34 && dc->clk_mgr->bw_params->num_channels < 2) {
+ int total_phy_pix_clk = 0;
+
+ for (int i = 0; i < context->stream_count; i++)
+ if (context->res_ctx.pipe_ctx[i].stream)
+ total_phy_pix_clk += context->res_ctx.pipe_ctx[i].stream->phy_pix_clk;
+
+ if (total_phy_pix_clk >= (1148928+826260)) //2K@240Hz+8K@24fps
+ return true;
+ }
+ return false;
+}
+
bool dcn314_validate_bandwidth(struct dc *dc,
struct dc_state *context,
bool fast_validate)
@@ -1712,6 +1729,9 @@ bool dcn314_validate_bandwidth(struct dc *dc,
BW_VAL_TRACE_COUNT();
+ if (filter_modes_for_single_channel_workaround(dc, context))
+ goto validate_fail;
+
DC_FP_START();
// do not support self refresh only
out = dcn30_internal_validate_bw(dc, context, pipes, &pipe_cnt, &vlevel, fast_validate, false);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 1e994cc0956b8dabd1b1fef315bbd722733b8aa8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042202-confused-runt-7bf2@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
1e994cc0956b ("drm/amd/display: limit timing for single dimm memory")
71c4ca2d3b07 ("drm/amd/display: Remove stutter only configurations")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e994cc0956b8dabd1b1fef315bbd722733b8aa8 Mon Sep 17 00:00:00 2001
From: Daniel Miess <Daniel.Miess(a)amd.com>
Date: Tue, 4 Apr 2023 14:04:11 -0400
Subject: [PATCH] drm/amd/display: limit timing for single dimm memory
[Why]
1. It could hit bandwidth limitdation under single dimm
memory when connecting 8K external monitor.
2. IsSupportedVidPn got validation failed with
2K240Hz eDP + 8K24Hz external monitor.
3. It's better to filter out such combination in
EnumVidPnCofuncModality
4. For short term, filter out in dc bandwidth validation.
[How]
Force 2K@240Hz+8K@24Hz timing validation false in dc.
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas(a)amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo(a)amd.com>
Signed-off-by: Daniel Miess <Daniel.Miess(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
index 54ed3de869d3..9ffba4c6fe55 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_resource.c
@@ -1697,6 +1697,23 @@ static void dcn314_get_panel_config_defaults(struct dc_panel_config *panel_confi
*panel_config = panel_config_defaults;
}
+static bool filter_modes_for_single_channel_workaround(struct dc *dc,
+ struct dc_state *context)
+{
+ // Filter 2K@240Hz+8K@24fps above combination timing if memory only has single dimm LPDDR
+ if (dc->clk_mgr->bw_params->vram_type == 34 && dc->clk_mgr->bw_params->num_channels < 2) {
+ int total_phy_pix_clk = 0;
+
+ for (int i = 0; i < context->stream_count; i++)
+ if (context->res_ctx.pipe_ctx[i].stream)
+ total_phy_pix_clk += context->res_ctx.pipe_ctx[i].stream->phy_pix_clk;
+
+ if (total_phy_pix_clk >= (1148928+826260)) //2K@240Hz+8K@24fps
+ return true;
+ }
+ return false;
+}
+
bool dcn314_validate_bandwidth(struct dc *dc,
struct dc_state *context,
bool fast_validate)
@@ -1712,6 +1729,9 @@ bool dcn314_validate_bandwidth(struct dc *dc,
BW_VAL_TRACE_COUNT();
+ if (filter_modes_for_single_channel_workaround(dc, context))
+ goto validate_fail;
+
DC_FP_START();
// do not support self refresh only
out = dcn30_internal_validate_bw(dc, context, pipes, &pipe_cnt, &vlevel, fast_validate, false);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 1ba1199ec5747f475538c0d25a32804e5ba1dfde
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042229-enchilada-cheddar-abb9@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
1ba1199ec574 ("writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs")
c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
f3b6a6df38aa ("writeback, cgroup: keep list of inodes attached to bdi_writeback")
29264d92a0f1 ("writeback, cgroup: switch to rcu_work API in inode_switch_wbs()")
8826ee4fe750 ("writeback, cgroup: increment isw_nr_in_flight before grabbing an inode")
4ade5867b4b8 ("writeback, cgroup: do not switch inodes with I_WILL_FREE flag")
e30942859030 ("Merge tag 'writeback_for_v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ba1199ec5747f475538c0d25a32804e5ba1dfde Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 10 Apr 2023 21:08:26 +0800
Subject: [PATCH] writeback, cgroup: fix null-ptr-deref write in
bdi_split_work_to_wbs
KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0xc0
print_report+0x2ba/0x340
kasan_report+0xc4/0x120
kasan_check_range+0x1b7/0x2e0
__kasan_check_write+0x24/0x40
bdi_split_work_to_wbs+0x5c5/0x7b0
sync_inodes_sb+0x195/0x630
sync_inodes_one_sb+0x3a/0x50
iterate_supers+0x106/0x1b0
ksys_sync+0x98/0x160
[...]
==================================================================
The race that causes the above issue is as follows:
cpu1 cpu2
-------------------------|-------------------------
inode_switch_wbs
INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
queue_rcu_work(isw_wq, &isw->work)
// queue_work async
inode_switch_wbs_work_fn
wb_put_many(old_wb, nr_switched)
percpu_ref_put_many
ref->data->release(ref)
cgwb_release
queue_work(cgwb_release_wq, &wb->release_work)
// queue_work async
&wb->release_work
cgwb_release_workfn
ksys_sync
iterate_supers
sync_inodes_one_sb
sync_inodes_sb
bdi_split_work_to_wbs
kmalloc(sizeof(*work), GFP_ATOMIC)
// alloc memory failed
percpu_ref_exit
ref->data = NULL
kfree(data)
wb_get(wb)
percpu_ref_get(&wb->refcnt)
percpu_ref_get_many(ref, 1)
atomic_long_add(nr, &ref->data->count)
atomic64_add(i, v)
// trigger null-ptr-deref
bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
wbs. If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.
This issue was introduced in v4.3-rc7 (see fix tag1). Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue. For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.
To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.
Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Acked-by: Tejun Heo <tj(a)kernel.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel(a)dilger.ca>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Hou Tao <houtao1(a)huawei.com>
Cc: yangerkun <yangerkun(a)huawei.com>
Cc: Zhang Yi <yi.zhang(a)huawei.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 195dc23e0d83..1db3e3c24b43 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -978,6 +978,16 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
continue;
}
+ /*
+ * If wb_tryget fails, the wb has been shutdown, skip it.
+ *
+ * Pin @wb so that it stays on @bdi->wb_list. This allows
+ * continuing iteration from @wb after dropping and
+ * regrabbing rcu read lock.
+ */
+ if (!wb_tryget(wb))
+ continue;
+
/* alloc failed, execute synchronously using on-stack fallback */
work = &fallback_work;
*work = *base_work;
@@ -986,13 +996,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
work->done = &fallback_work_done;
wb_queue_work(wb, work);
-
- /*
- * Pin @wb so that it stays on @bdi->wb_list. This allows
- * continuing iteration from @wb after dropping and
- * regrabbing rcu read lock.
- */
- wb_get(wb);
last_wb = wb;
rcu_read_unlock();
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index a53b9360b72e..30d2d0386fdb 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -507,6 +507,15 @@ static LIST_HEAD(offline_cgwbs);
static void cleanup_offline_cgwbs_workfn(struct work_struct *work);
static DECLARE_WORK(cleanup_offline_cgwbs_work, cleanup_offline_cgwbs_workfn);
+static void cgwb_free_rcu(struct rcu_head *rcu_head)
+{
+ struct bdi_writeback *wb = container_of(rcu_head,
+ struct bdi_writeback, rcu);
+
+ percpu_ref_exit(&wb->refcnt);
+ kfree(wb);
+}
+
static void cgwb_release_workfn(struct work_struct *work)
{
struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
@@ -529,11 +538,10 @@ static void cgwb_release_workfn(struct work_struct *work)
list_del(&wb->offline_node);
spin_unlock_irq(&cgwb_lock);
- percpu_ref_exit(&wb->refcnt);
wb_exit(wb);
bdi_put(bdi);
WARN_ON_ONCE(!list_empty(&wb->b_attached));
- kfree_rcu(wb, rcu);
+ call_rcu(&wb->rcu, cgwb_free_rcu);
}
static void cgwb_release(struct percpu_ref *refcnt)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 1ba1199ec5747f475538c0d25a32804e5ba1dfde
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042224-tricolor-shining-6cd0@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
1ba1199ec574 ("writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs")
c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
f3b6a6df38aa ("writeback, cgroup: keep list of inodes attached to bdi_writeback")
29264d92a0f1 ("writeback, cgroup: switch to rcu_work API in inode_switch_wbs()")
8826ee4fe750 ("writeback, cgroup: increment isw_nr_in_flight before grabbing an inode")
4ade5867b4b8 ("writeback, cgroup: do not switch inodes with I_WILL_FREE flag")
e30942859030 ("Merge tag 'writeback_for_v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ba1199ec5747f475538c0d25a32804e5ba1dfde Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 10 Apr 2023 21:08:26 +0800
Subject: [PATCH] writeback, cgroup: fix null-ptr-deref write in
bdi_split_work_to_wbs
KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0xc0
print_report+0x2ba/0x340
kasan_report+0xc4/0x120
kasan_check_range+0x1b7/0x2e0
__kasan_check_write+0x24/0x40
bdi_split_work_to_wbs+0x5c5/0x7b0
sync_inodes_sb+0x195/0x630
sync_inodes_one_sb+0x3a/0x50
iterate_supers+0x106/0x1b0
ksys_sync+0x98/0x160
[...]
==================================================================
The race that causes the above issue is as follows:
cpu1 cpu2
-------------------------|-------------------------
inode_switch_wbs
INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
queue_rcu_work(isw_wq, &isw->work)
// queue_work async
inode_switch_wbs_work_fn
wb_put_many(old_wb, nr_switched)
percpu_ref_put_many
ref->data->release(ref)
cgwb_release
queue_work(cgwb_release_wq, &wb->release_work)
// queue_work async
&wb->release_work
cgwb_release_workfn
ksys_sync
iterate_supers
sync_inodes_one_sb
sync_inodes_sb
bdi_split_work_to_wbs
kmalloc(sizeof(*work), GFP_ATOMIC)
// alloc memory failed
percpu_ref_exit
ref->data = NULL
kfree(data)
wb_get(wb)
percpu_ref_get(&wb->refcnt)
percpu_ref_get_many(ref, 1)
atomic_long_add(nr, &ref->data->count)
atomic64_add(i, v)
// trigger null-ptr-deref
bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
wbs. If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.
This issue was introduced in v4.3-rc7 (see fix tag1). Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue. For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.
To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.
Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Acked-by: Tejun Heo <tj(a)kernel.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel(a)dilger.ca>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Hou Tao <houtao1(a)huawei.com>
Cc: yangerkun <yangerkun(a)huawei.com>
Cc: Zhang Yi <yi.zhang(a)huawei.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 195dc23e0d83..1db3e3c24b43 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -978,6 +978,16 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
continue;
}
+ /*
+ * If wb_tryget fails, the wb has been shutdown, skip it.
+ *
+ * Pin @wb so that it stays on @bdi->wb_list. This allows
+ * continuing iteration from @wb after dropping and
+ * regrabbing rcu read lock.
+ */
+ if (!wb_tryget(wb))
+ continue;
+
/* alloc failed, execute synchronously using on-stack fallback */
work = &fallback_work;
*work = *base_work;
@@ -986,13 +996,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
work->done = &fallback_work_done;
wb_queue_work(wb, work);
-
- /*
- * Pin @wb so that it stays on @bdi->wb_list. This allows
- * continuing iteration from @wb after dropping and
- * regrabbing rcu read lock.
- */
- wb_get(wb);
last_wb = wb;
rcu_read_unlock();
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index a53b9360b72e..30d2d0386fdb 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -507,6 +507,15 @@ static LIST_HEAD(offline_cgwbs);
static void cleanup_offline_cgwbs_workfn(struct work_struct *work);
static DECLARE_WORK(cleanup_offline_cgwbs_work, cleanup_offline_cgwbs_workfn);
+static void cgwb_free_rcu(struct rcu_head *rcu_head)
+{
+ struct bdi_writeback *wb = container_of(rcu_head,
+ struct bdi_writeback, rcu);
+
+ percpu_ref_exit(&wb->refcnt);
+ kfree(wb);
+}
+
static void cgwb_release_workfn(struct work_struct *work)
{
struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
@@ -529,11 +538,10 @@ static void cgwb_release_workfn(struct work_struct *work)
list_del(&wb->offline_node);
spin_unlock_irq(&cgwb_lock);
- percpu_ref_exit(&wb->refcnt);
wb_exit(wb);
bdi_put(bdi);
WARN_ON_ONCE(!list_empty(&wb->b_attached));
- kfree_rcu(wb, rcu);
+ call_rcu(&wb->rcu, cgwb_free_rcu);
}
static void cgwb_release(struct percpu_ref *refcnt)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 1ba1199ec5747f475538c0d25a32804e5ba1dfde
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042221-hazing-platonic-1a57@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
1ba1199ec574 ("writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs")
c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
f3b6a6df38aa ("writeback, cgroup: keep list of inodes attached to bdi_writeback")
29264d92a0f1 ("writeback, cgroup: switch to rcu_work API in inode_switch_wbs()")
8826ee4fe750 ("writeback, cgroup: increment isw_nr_in_flight before grabbing an inode")
4ade5867b4b8 ("writeback, cgroup: do not switch inodes with I_WILL_FREE flag")
e30942859030 ("Merge tag 'writeback_for_v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ba1199ec5747f475538c0d25a32804e5ba1dfde Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 10 Apr 2023 21:08:26 +0800
Subject: [PATCH] writeback, cgroup: fix null-ptr-deref write in
bdi_split_work_to_wbs
KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0xc0
print_report+0x2ba/0x340
kasan_report+0xc4/0x120
kasan_check_range+0x1b7/0x2e0
__kasan_check_write+0x24/0x40
bdi_split_work_to_wbs+0x5c5/0x7b0
sync_inodes_sb+0x195/0x630
sync_inodes_one_sb+0x3a/0x50
iterate_supers+0x106/0x1b0
ksys_sync+0x98/0x160
[...]
==================================================================
The race that causes the above issue is as follows:
cpu1 cpu2
-------------------------|-------------------------
inode_switch_wbs
INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
queue_rcu_work(isw_wq, &isw->work)
// queue_work async
inode_switch_wbs_work_fn
wb_put_many(old_wb, nr_switched)
percpu_ref_put_many
ref->data->release(ref)
cgwb_release
queue_work(cgwb_release_wq, &wb->release_work)
// queue_work async
&wb->release_work
cgwb_release_workfn
ksys_sync
iterate_supers
sync_inodes_one_sb
sync_inodes_sb
bdi_split_work_to_wbs
kmalloc(sizeof(*work), GFP_ATOMIC)
// alloc memory failed
percpu_ref_exit
ref->data = NULL
kfree(data)
wb_get(wb)
percpu_ref_get(&wb->refcnt)
percpu_ref_get_many(ref, 1)
atomic_long_add(nr, &ref->data->count)
atomic64_add(i, v)
// trigger null-ptr-deref
bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
wbs. If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.
This issue was introduced in v4.3-rc7 (see fix tag1). Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue. For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.
To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.
Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Acked-by: Tejun Heo <tj(a)kernel.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel(a)dilger.ca>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Hou Tao <houtao1(a)huawei.com>
Cc: yangerkun <yangerkun(a)huawei.com>
Cc: Zhang Yi <yi.zhang(a)huawei.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 195dc23e0d83..1db3e3c24b43 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -978,6 +978,16 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
continue;
}
+ /*
+ * If wb_tryget fails, the wb has been shutdown, skip it.
+ *
+ * Pin @wb so that it stays on @bdi->wb_list. This allows
+ * continuing iteration from @wb after dropping and
+ * regrabbing rcu read lock.
+ */
+ if (!wb_tryget(wb))
+ continue;
+
/* alloc failed, execute synchronously using on-stack fallback */
work = &fallback_work;
*work = *base_work;
@@ -986,13 +996,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
work->done = &fallback_work_done;
wb_queue_work(wb, work);
-
- /*
- * Pin @wb so that it stays on @bdi->wb_list. This allows
- * continuing iteration from @wb after dropping and
- * regrabbing rcu read lock.
- */
- wb_get(wb);
last_wb = wb;
rcu_read_unlock();
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index a53b9360b72e..30d2d0386fdb 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -507,6 +507,15 @@ static LIST_HEAD(offline_cgwbs);
static void cleanup_offline_cgwbs_workfn(struct work_struct *work);
static DECLARE_WORK(cleanup_offline_cgwbs_work, cleanup_offline_cgwbs_workfn);
+static void cgwb_free_rcu(struct rcu_head *rcu_head)
+{
+ struct bdi_writeback *wb = container_of(rcu_head,
+ struct bdi_writeback, rcu);
+
+ percpu_ref_exit(&wb->refcnt);
+ kfree(wb);
+}
+
static void cgwb_release_workfn(struct work_struct *work)
{
struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
@@ -529,11 +538,10 @@ static void cgwb_release_workfn(struct work_struct *work)
list_del(&wb->offline_node);
spin_unlock_irq(&cgwb_lock);
- percpu_ref_exit(&wb->refcnt);
wb_exit(wb);
bdi_put(bdi);
WARN_ON_ONCE(!list_empty(&wb->b_attached));
- kfree_rcu(wb, rcu);
+ call_rcu(&wb->rcu, cgwb_free_rcu);
}
static void cgwb_release(struct percpu_ref *refcnt)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1ba1199ec5747f475538c0d25a32804e5ba1dfde
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042219-woof-sanitary-4833@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
1ba1199ec574 ("writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs")
c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
f3b6a6df38aa ("writeback, cgroup: keep list of inodes attached to bdi_writeback")
29264d92a0f1 ("writeback, cgroup: switch to rcu_work API in inode_switch_wbs()")
8826ee4fe750 ("writeback, cgroup: increment isw_nr_in_flight before grabbing an inode")
4ade5867b4b8 ("writeback, cgroup: do not switch inodes with I_WILL_FREE flag")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ba1199ec5747f475538c0d25a32804e5ba1dfde Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 10 Apr 2023 21:08:26 +0800
Subject: [PATCH] writeback, cgroup: fix null-ptr-deref write in
bdi_split_work_to_wbs
KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0xc0
print_report+0x2ba/0x340
kasan_report+0xc4/0x120
kasan_check_range+0x1b7/0x2e0
__kasan_check_write+0x24/0x40
bdi_split_work_to_wbs+0x5c5/0x7b0
sync_inodes_sb+0x195/0x630
sync_inodes_one_sb+0x3a/0x50
iterate_supers+0x106/0x1b0
ksys_sync+0x98/0x160
[...]
==================================================================
The race that causes the above issue is as follows:
cpu1 cpu2
-------------------------|-------------------------
inode_switch_wbs
INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
queue_rcu_work(isw_wq, &isw->work)
// queue_work async
inode_switch_wbs_work_fn
wb_put_many(old_wb, nr_switched)
percpu_ref_put_many
ref->data->release(ref)
cgwb_release
queue_work(cgwb_release_wq, &wb->release_work)
// queue_work async
&wb->release_work
cgwb_release_workfn
ksys_sync
iterate_supers
sync_inodes_one_sb
sync_inodes_sb
bdi_split_work_to_wbs
kmalloc(sizeof(*work), GFP_ATOMIC)
// alloc memory failed
percpu_ref_exit
ref->data = NULL
kfree(data)
wb_get(wb)
percpu_ref_get(&wb->refcnt)
percpu_ref_get_many(ref, 1)
atomic_long_add(nr, &ref->data->count)
atomic64_add(i, v)
// trigger null-ptr-deref
bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
wbs. If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.
This issue was introduced in v4.3-rc7 (see fix tag1). Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue. For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.
To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.
Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Acked-by: Tejun Heo <tj(a)kernel.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel(a)dilger.ca>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Hou Tao <houtao1(a)huawei.com>
Cc: yangerkun <yangerkun(a)huawei.com>
Cc: Zhang Yi <yi.zhang(a)huawei.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 195dc23e0d83..1db3e3c24b43 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -978,6 +978,16 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
continue;
}
+ /*
+ * If wb_tryget fails, the wb has been shutdown, skip it.
+ *
+ * Pin @wb so that it stays on @bdi->wb_list. This allows
+ * continuing iteration from @wb after dropping and
+ * regrabbing rcu read lock.
+ */
+ if (!wb_tryget(wb))
+ continue;
+
/* alloc failed, execute synchronously using on-stack fallback */
work = &fallback_work;
*work = *base_work;
@@ -986,13 +996,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
work->done = &fallback_work_done;
wb_queue_work(wb, work);
-
- /*
- * Pin @wb so that it stays on @bdi->wb_list. This allows
- * continuing iteration from @wb after dropping and
- * regrabbing rcu read lock.
- */
- wb_get(wb);
last_wb = wb;
rcu_read_unlock();
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index a53b9360b72e..30d2d0386fdb 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -507,6 +507,15 @@ static LIST_HEAD(offline_cgwbs);
static void cleanup_offline_cgwbs_workfn(struct work_struct *work);
static DECLARE_WORK(cleanup_offline_cgwbs_work, cleanup_offline_cgwbs_workfn);
+static void cgwb_free_rcu(struct rcu_head *rcu_head)
+{
+ struct bdi_writeback *wb = container_of(rcu_head,
+ struct bdi_writeback, rcu);
+
+ percpu_ref_exit(&wb->refcnt);
+ kfree(wb);
+}
+
static void cgwb_release_workfn(struct work_struct *work)
{
struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
@@ -529,11 +538,10 @@ static void cgwb_release_workfn(struct work_struct *work)
list_del(&wb->offline_node);
spin_unlock_irq(&cgwb_lock);
- percpu_ref_exit(&wb->refcnt);
wb_exit(wb);
bdi_put(bdi);
WARN_ON_ONCE(!list_empty(&wb->b_attached));
- kfree_rcu(wb, rcu);
+ call_rcu(&wb->rcu, cgwb_free_rcu);
}
static void cgwb_release(struct percpu_ref *refcnt)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 1ba1199ec5747f475538c0d25a32804e5ba1dfde
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042217-zoom-handset-cf9f@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
1ba1199ec574 ("writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ba1199ec5747f475538c0d25a32804e5ba1dfde Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 10 Apr 2023 21:08:26 +0800
Subject: [PATCH] writeback, cgroup: fix null-ptr-deref write in
bdi_split_work_to_wbs
KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0xc0
print_report+0x2ba/0x340
kasan_report+0xc4/0x120
kasan_check_range+0x1b7/0x2e0
__kasan_check_write+0x24/0x40
bdi_split_work_to_wbs+0x5c5/0x7b0
sync_inodes_sb+0x195/0x630
sync_inodes_one_sb+0x3a/0x50
iterate_supers+0x106/0x1b0
ksys_sync+0x98/0x160
[...]
==================================================================
The race that causes the above issue is as follows:
cpu1 cpu2
-------------------------|-------------------------
inode_switch_wbs
INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
queue_rcu_work(isw_wq, &isw->work)
// queue_work async
inode_switch_wbs_work_fn
wb_put_many(old_wb, nr_switched)
percpu_ref_put_many
ref->data->release(ref)
cgwb_release
queue_work(cgwb_release_wq, &wb->release_work)
// queue_work async
&wb->release_work
cgwb_release_workfn
ksys_sync
iterate_supers
sync_inodes_one_sb
sync_inodes_sb
bdi_split_work_to_wbs
kmalloc(sizeof(*work), GFP_ATOMIC)
// alloc memory failed
percpu_ref_exit
ref->data = NULL
kfree(data)
wb_get(wb)
percpu_ref_get(&wb->refcnt)
percpu_ref_get_many(ref, 1)
atomic_long_add(nr, &ref->data->count)
atomic64_add(i, v)
// trigger null-ptr-deref
bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
wbs. If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.
This issue was introduced in v4.3-rc7 (see fix tag1). Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue. For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.
To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.
Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Acked-by: Tejun Heo <tj(a)kernel.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel(a)dilger.ca>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Hou Tao <houtao1(a)huawei.com>
Cc: yangerkun <yangerkun(a)huawei.com>
Cc: Zhang Yi <yi.zhang(a)huawei.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 195dc23e0d83..1db3e3c24b43 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -978,6 +978,16 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
continue;
}
+ /*
+ * If wb_tryget fails, the wb has been shutdown, skip it.
+ *
+ * Pin @wb so that it stays on @bdi->wb_list. This allows
+ * continuing iteration from @wb after dropping and
+ * regrabbing rcu read lock.
+ */
+ if (!wb_tryget(wb))
+ continue;
+
/* alloc failed, execute synchronously using on-stack fallback */
work = &fallback_work;
*work = *base_work;
@@ -986,13 +996,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
work->done = &fallback_work_done;
wb_queue_work(wb, work);
-
- /*
- * Pin @wb so that it stays on @bdi->wb_list. This allows
- * continuing iteration from @wb after dropping and
- * regrabbing rcu read lock.
- */
- wb_get(wb);
last_wb = wb;
rcu_read_unlock();
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index a53b9360b72e..30d2d0386fdb 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -507,6 +507,15 @@ static LIST_HEAD(offline_cgwbs);
static void cleanup_offline_cgwbs_workfn(struct work_struct *work);
static DECLARE_WORK(cleanup_offline_cgwbs_work, cleanup_offline_cgwbs_workfn);
+static void cgwb_free_rcu(struct rcu_head *rcu_head)
+{
+ struct bdi_writeback *wb = container_of(rcu_head,
+ struct bdi_writeback, rcu);
+
+ percpu_ref_exit(&wb->refcnt);
+ kfree(wb);
+}
+
static void cgwb_release_workfn(struct work_struct *work)
{
struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
@@ -529,11 +538,10 @@ static void cgwb_release_workfn(struct work_struct *work)
list_del(&wb->offline_node);
spin_unlock_irq(&cgwb_lock);
- percpu_ref_exit(&wb->refcnt);
wb_exit(wb);
bdi_put(bdi);
WARN_ON_ONCE(!list_empty(&wb->b_attached));
- kfree_rcu(wb, rcu);
+ call_rcu(&wb->rcu, cgwb_free_rcu);
}
static void cgwb_release(struct percpu_ref *refcnt)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 659c0ce1cb9efc7f58d380ca4bb2a51ae9e30553
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042205-easel-pantry-72d9@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
659c0ce1cb9e ("kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()")
111767c1d86b ("LSM: Signal to SafeSetID when setting group IDs")
40852275a94a ("LSM: add SafeSetID module that gates setid calls")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 659c0ce1cb9efc7f58d380ca4bb2a51ae9e30553 Mon Sep 17 00:00:00 2001
From: Ondrej Mosnacek <omosnace(a)redhat.com>
Date: Fri, 17 Feb 2023 17:21:54 +0100
Subject: [PATCH] kernel/sys.c: fix and improve control flow in
__sys_setres[ug]id()
Linux Security Modules (LSMs) that implement the "capable" hook will
usually emit an access denial message to the audit log whenever they
"block" the current task from using the given capability based on their
security policy.
The occurrence of a denial is used as an indication that the given task
has attempted an operation that requires the given access permission, so
the callers of functions that perform LSM permission checks must take care
to avoid calling them too early (before it is decided if the permission is
actually needed to perform the requested operation).
The __sys_setres[ug]id() functions violate this convention by first
calling ns_capable_setid() and only then checking if the operation
requires the capability or not. It means that any caller that has the
capability granted by DAC (task's capability set) but not by MAC (LSMs)
will generate a "denied" audit record, even if is doing an operation for
which the capability is not required.
Fix this by reordering the checks such that ns_capable_setid() is checked
last and -EPERM is returned immediately if it returns false.
While there, also do two small optimizations:
* move the capability check before prepare_creds() and
* bail out early in case of a no-op.
Link: https://lkml.kernel.org/r/20230217162154.837549-1-omosnace@redhat.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace(a)redhat.com>
Cc: Eric W. Biederman <ebiederm(a)xmission.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/sys.c b/kernel/sys.c
index 495cd87d9bf4..351de7916302 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -664,6 +664,7 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
struct cred *new;
int retval;
kuid_t kruid, keuid, ksuid;
+ bool ruid_new, euid_new, suid_new;
kruid = make_kuid(ns, ruid);
keuid = make_kuid(ns, euid);
@@ -678,25 +679,29 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
if ((suid != (uid_t) -1) && !uid_valid(ksuid))
return -EINVAL;
+ old = current_cred();
+
+ /* check for no-op */
+ if ((ruid == (uid_t) -1 || uid_eq(kruid, old->uid)) &&
+ (euid == (uid_t) -1 || (uid_eq(keuid, old->euid) &&
+ uid_eq(keuid, old->fsuid))) &&
+ (suid == (uid_t) -1 || uid_eq(ksuid, old->suid)))
+ return 0;
+
+ ruid_new = ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
+ !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid);
+ euid_new = euid != (uid_t) -1 && !uid_eq(keuid, old->uid) &&
+ !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid);
+ suid_new = suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) &&
+ !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid);
+ if ((ruid_new || euid_new || suid_new) &&
+ !ns_capable_setid(old->user_ns, CAP_SETUID))
+ return -EPERM;
+
new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();
-
- retval = -EPERM;
- if (!ns_capable_setid(old->user_ns, CAP_SETUID)) {
- if (ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
- !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
- goto error;
- if (euid != (uid_t) -1 && !uid_eq(keuid, old->uid) &&
- !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid))
- goto error;
- if (suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) &&
- !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid))
- goto error;
- }
-
if (ruid != (uid_t) -1) {
new->uid = kruid;
if (!uid_eq(kruid, old->uid)) {
@@ -761,6 +766,7 @@ long __sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid)
struct cred *new;
int retval;
kgid_t krgid, kegid, ksgid;
+ bool rgid_new, egid_new, sgid_new;
krgid = make_kgid(ns, rgid);
kegid = make_kgid(ns, egid);
@@ -773,23 +779,28 @@ long __sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid)
if ((sgid != (gid_t) -1) && !gid_valid(ksgid))
return -EINVAL;
+ old = current_cred();
+
+ /* check for no-op */
+ if ((rgid == (gid_t) -1 || gid_eq(krgid, old->gid)) &&
+ (egid == (gid_t) -1 || (gid_eq(kegid, old->egid) &&
+ gid_eq(kegid, old->fsgid))) &&
+ (sgid == (gid_t) -1 || gid_eq(ksgid, old->sgid)))
+ return 0;
+
+ rgid_new = rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
+ !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid);
+ egid_new = egid != (gid_t) -1 && !gid_eq(kegid, old->gid) &&
+ !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid);
+ sgid_new = sgid != (gid_t) -1 && !gid_eq(ksgid, old->gid) &&
+ !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid);
+ if ((rgid_new || egid_new || sgid_new) &&
+ !ns_capable_setid(old->user_ns, CAP_SETGID))
+ return -EPERM;
+
new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();
-
- retval = -EPERM;
- if (!ns_capable_setid(old->user_ns, CAP_SETGID)) {
- if (rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
- !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
- goto error;
- if (egid != (gid_t) -1 && !gid_eq(kegid, old->gid) &&
- !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid))
- goto error;
- if (sgid != (gid_t) -1 && !gid_eq(ksgid, old->gid) &&
- !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid))
- goto error;
- }
if (rgid != (gid_t) -1)
new->gid = krgid;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 659c0ce1cb9efc7f58d380ca4bb2a51ae9e30553
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042203-wad-winter-024f@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
659c0ce1cb9e ("kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()")
111767c1d86b ("LSM: Signal to SafeSetID when setting group IDs")
40852275a94a ("LSM: add SafeSetID module that gates setid calls")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 659c0ce1cb9efc7f58d380ca4bb2a51ae9e30553 Mon Sep 17 00:00:00 2001
From: Ondrej Mosnacek <omosnace(a)redhat.com>
Date: Fri, 17 Feb 2023 17:21:54 +0100
Subject: [PATCH] kernel/sys.c: fix and improve control flow in
__sys_setres[ug]id()
Linux Security Modules (LSMs) that implement the "capable" hook will
usually emit an access denial message to the audit log whenever they
"block" the current task from using the given capability based on their
security policy.
The occurrence of a denial is used as an indication that the given task
has attempted an operation that requires the given access permission, so
the callers of functions that perform LSM permission checks must take care
to avoid calling them too early (before it is decided if the permission is
actually needed to perform the requested operation).
The __sys_setres[ug]id() functions violate this convention by first
calling ns_capable_setid() and only then checking if the operation
requires the capability or not. It means that any caller that has the
capability granted by DAC (task's capability set) but not by MAC (LSMs)
will generate a "denied" audit record, even if is doing an operation for
which the capability is not required.
Fix this by reordering the checks such that ns_capable_setid() is checked
last and -EPERM is returned immediately if it returns false.
While there, also do two small optimizations:
* move the capability check before prepare_creds() and
* bail out early in case of a no-op.
Link: https://lkml.kernel.org/r/20230217162154.837549-1-omosnace@redhat.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace(a)redhat.com>
Cc: Eric W. Biederman <ebiederm(a)xmission.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/sys.c b/kernel/sys.c
index 495cd87d9bf4..351de7916302 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -664,6 +664,7 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
struct cred *new;
int retval;
kuid_t kruid, keuid, ksuid;
+ bool ruid_new, euid_new, suid_new;
kruid = make_kuid(ns, ruid);
keuid = make_kuid(ns, euid);
@@ -678,25 +679,29 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
if ((suid != (uid_t) -1) && !uid_valid(ksuid))
return -EINVAL;
+ old = current_cred();
+
+ /* check for no-op */
+ if ((ruid == (uid_t) -1 || uid_eq(kruid, old->uid)) &&
+ (euid == (uid_t) -1 || (uid_eq(keuid, old->euid) &&
+ uid_eq(keuid, old->fsuid))) &&
+ (suid == (uid_t) -1 || uid_eq(ksuid, old->suid)))
+ return 0;
+
+ ruid_new = ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
+ !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid);
+ euid_new = euid != (uid_t) -1 && !uid_eq(keuid, old->uid) &&
+ !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid);
+ suid_new = suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) &&
+ !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid);
+ if ((ruid_new || euid_new || suid_new) &&
+ !ns_capable_setid(old->user_ns, CAP_SETUID))
+ return -EPERM;
+
new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();
-
- retval = -EPERM;
- if (!ns_capable_setid(old->user_ns, CAP_SETUID)) {
- if (ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
- !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
- goto error;
- if (euid != (uid_t) -1 && !uid_eq(keuid, old->uid) &&
- !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid))
- goto error;
- if (suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) &&
- !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid))
- goto error;
- }
-
if (ruid != (uid_t) -1) {
new->uid = kruid;
if (!uid_eq(kruid, old->uid)) {
@@ -761,6 +766,7 @@ long __sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid)
struct cred *new;
int retval;
kgid_t krgid, kegid, ksgid;
+ bool rgid_new, egid_new, sgid_new;
krgid = make_kgid(ns, rgid);
kegid = make_kgid(ns, egid);
@@ -773,23 +779,28 @@ long __sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid)
if ((sgid != (gid_t) -1) && !gid_valid(ksgid))
return -EINVAL;
+ old = current_cred();
+
+ /* check for no-op */
+ if ((rgid == (gid_t) -1 || gid_eq(krgid, old->gid)) &&
+ (egid == (gid_t) -1 || (gid_eq(kegid, old->egid) &&
+ gid_eq(kegid, old->fsgid))) &&
+ (sgid == (gid_t) -1 || gid_eq(ksgid, old->sgid)))
+ return 0;
+
+ rgid_new = rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
+ !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid);
+ egid_new = egid != (gid_t) -1 && !gid_eq(kegid, old->gid) &&
+ !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid);
+ sgid_new = sgid != (gid_t) -1 && !gid_eq(ksgid, old->gid) &&
+ !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid);
+ if ((rgid_new || egid_new || sgid_new) &&
+ !ns_capable_setid(old->user_ns, CAP_SETGID))
+ return -EPERM;
+
new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();
-
- retval = -EPERM;
- if (!ns_capable_setid(old->user_ns, CAP_SETGID)) {
- if (rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
- !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
- goto error;
- if (egid != (gid_t) -1 && !gid_eq(kegid, old->gid) &&
- !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid))
- goto error;
- if (sgid != (gid_t) -1 && !gid_eq(ksgid, old->gid) &&
- !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid))
- goto error;
- }
if (rgid != (gid_t) -1)
new->gid = krgid;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 659c0ce1cb9efc7f58d380ca4bb2a51ae9e30553
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023042201-superior-freebase-a71f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
659c0ce1cb9e ("kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()")
111767c1d86b ("LSM: Signal to SafeSetID when setting group IDs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 659c0ce1cb9efc7f58d380ca4bb2a51ae9e30553 Mon Sep 17 00:00:00 2001
From: Ondrej Mosnacek <omosnace(a)redhat.com>
Date: Fri, 17 Feb 2023 17:21:54 +0100
Subject: [PATCH] kernel/sys.c: fix and improve control flow in
__sys_setres[ug]id()
Linux Security Modules (LSMs) that implement the "capable" hook will
usually emit an access denial message to the audit log whenever they
"block" the current task from using the given capability based on their
security policy.
The occurrence of a denial is used as an indication that the given task
has attempted an operation that requires the given access permission, so
the callers of functions that perform LSM permission checks must take care
to avoid calling them too early (before it is decided if the permission is
actually needed to perform the requested operation).
The __sys_setres[ug]id() functions violate this convention by first
calling ns_capable_setid() and only then checking if the operation
requires the capability or not. It means that any caller that has the
capability granted by DAC (task's capability set) but not by MAC (LSMs)
will generate a "denied" audit record, even if is doing an operation for
which the capability is not required.
Fix this by reordering the checks such that ns_capable_setid() is checked
last and -EPERM is returned immediately if it returns false.
While there, also do two small optimizations:
* move the capability check before prepare_creds() and
* bail out early in case of a no-op.
Link: https://lkml.kernel.org/r/20230217162154.837549-1-omosnace@redhat.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace(a)redhat.com>
Cc: Eric W. Biederman <ebiederm(a)xmission.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/sys.c b/kernel/sys.c
index 495cd87d9bf4..351de7916302 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -664,6 +664,7 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
struct cred *new;
int retval;
kuid_t kruid, keuid, ksuid;
+ bool ruid_new, euid_new, suid_new;
kruid = make_kuid(ns, ruid);
keuid = make_kuid(ns, euid);
@@ -678,25 +679,29 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
if ((suid != (uid_t) -1) && !uid_valid(ksuid))
return -EINVAL;
+ old = current_cred();
+
+ /* check for no-op */
+ if ((ruid == (uid_t) -1 || uid_eq(kruid, old->uid)) &&
+ (euid == (uid_t) -1 || (uid_eq(keuid, old->euid) &&
+ uid_eq(keuid, old->fsuid))) &&
+ (suid == (uid_t) -1 || uid_eq(ksuid, old->suid)))
+ return 0;
+
+ ruid_new = ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
+ !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid);
+ euid_new = euid != (uid_t) -1 && !uid_eq(keuid, old->uid) &&
+ !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid);
+ suid_new = suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) &&
+ !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid);
+ if ((ruid_new || euid_new || suid_new) &&
+ !ns_capable_setid(old->user_ns, CAP_SETUID))
+ return -EPERM;
+
new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();
-
- retval = -EPERM;
- if (!ns_capable_setid(old->user_ns, CAP_SETUID)) {
- if (ruid != (uid_t) -1 && !uid_eq(kruid, old->uid) &&
- !uid_eq(kruid, old->euid) && !uid_eq(kruid, old->suid))
- goto error;
- if (euid != (uid_t) -1 && !uid_eq(keuid, old->uid) &&
- !uid_eq(keuid, old->euid) && !uid_eq(keuid, old->suid))
- goto error;
- if (suid != (uid_t) -1 && !uid_eq(ksuid, old->uid) &&
- !uid_eq(ksuid, old->euid) && !uid_eq(ksuid, old->suid))
- goto error;
- }
-
if (ruid != (uid_t) -1) {
new->uid = kruid;
if (!uid_eq(kruid, old->uid)) {
@@ -761,6 +766,7 @@ long __sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid)
struct cred *new;
int retval;
kgid_t krgid, kegid, ksgid;
+ bool rgid_new, egid_new, sgid_new;
krgid = make_kgid(ns, rgid);
kegid = make_kgid(ns, egid);
@@ -773,23 +779,28 @@ long __sys_setresgid(gid_t rgid, gid_t egid, gid_t sgid)
if ((sgid != (gid_t) -1) && !gid_valid(ksgid))
return -EINVAL;
+ old = current_cred();
+
+ /* check for no-op */
+ if ((rgid == (gid_t) -1 || gid_eq(krgid, old->gid)) &&
+ (egid == (gid_t) -1 || (gid_eq(kegid, old->egid) &&
+ gid_eq(kegid, old->fsgid))) &&
+ (sgid == (gid_t) -1 || gid_eq(ksgid, old->sgid)))
+ return 0;
+
+ rgid_new = rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
+ !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid);
+ egid_new = egid != (gid_t) -1 && !gid_eq(kegid, old->gid) &&
+ !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid);
+ sgid_new = sgid != (gid_t) -1 && !gid_eq(ksgid, old->gid) &&
+ !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid);
+ if ((rgid_new || egid_new || sgid_new) &&
+ !ns_capable_setid(old->user_ns, CAP_SETGID))
+ return -EPERM;
+
new = prepare_creds();
if (!new)
return -ENOMEM;
- old = current_cred();
-
- retval = -EPERM;
- if (!ns_capable_setid(old->user_ns, CAP_SETGID)) {
- if (rgid != (gid_t) -1 && !gid_eq(krgid, old->gid) &&
- !gid_eq(krgid, old->egid) && !gid_eq(krgid, old->sgid))
- goto error;
- if (egid != (gid_t) -1 && !gid_eq(kegid, old->gid) &&
- !gid_eq(kegid, old->egid) && !gid_eq(kegid, old->sgid))
- goto error;
- if (sgid != (gid_t) -1 && !gid_eq(ksgid, old->gid) &&
- !gid_eq(ksgid, old->egid) && !gid_eq(ksgid, old->sgid))
- goto error;
- }
if (rgid != (gid_t) -1)
new->gid = krgid;
From: Filipe Manana <fdmanana(a)suse.com>
commit d47704bd1c78c85831561bcf701b90dd66f811b2 upstream.
At find_delalloc_subrange(), when we need to get the next extent map, we
do a full search on the extent map tree (a red black tree). This is fine
but it's a lot more efficient to simply use rb_next(), which typically
requires iterating over less nodes of the tree and never needs to compare
the ranges of nodes with the one we are looking for.
So add a public helper to extent_map.{h,c} to get the extent map that
immediately follows another extent map, using rb_next(), and use that
helper at find_delalloc_subrange().
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
Please add this patch to the next 6.1 stable release.
It happens to fix a bug recently reported at:
https://bugzilla.redhat.com/show_bug.cgi?id=2187312
Thanks.
fs/btrfs/extent_map.c | 31 +++++++++++++++++++++++++++++-
fs/btrfs/extent_map.h | 2 ++
fs/btrfs/file.c | 44 ++++++++++++++++++++++++++-----------------
3 files changed, 59 insertions(+), 18 deletions(-)
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index b8ae02aa632e..4abbe4b35253 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -523,7 +523,7 @@ void replace_extent_mapping(struct extent_map_tree *tree,
setup_extent_mapping(tree, new, modified);
}
-static struct extent_map *next_extent_map(struct extent_map *em)
+static struct extent_map *next_extent_map(const struct extent_map *em)
{
struct rb_node *next;
@@ -533,6 +533,35 @@ static struct extent_map *next_extent_map(struct extent_map *em)
return container_of(next, struct extent_map, rb_node);
}
+/*
+ * Get the extent map that immediately follows another one.
+ *
+ * @tree: The extent map tree that the extent map belong to.
+ * Holding read or write access on the tree's lock is required.
+ * @em: An extent map from the given tree. The caller must ensure that
+ * between getting @em and between calling this function, the
+ * extent map @em is not removed from the tree - for example, by
+ * holding the tree's lock for the duration of those 2 operations.
+ *
+ * Returns the extent map that immediately follows @em, or NULL if @em is the
+ * last extent map in the tree.
+ */
+struct extent_map *btrfs_next_extent_map(const struct extent_map_tree *tree,
+ const struct extent_map *em)
+{
+ struct extent_map *next;
+
+ /* The lock must be acquired either in read mode or write mode. */
+ lockdep_assert_held(&tree->lock);
+ ASSERT(extent_map_in_tree(em));
+
+ next = next_extent_map(em);
+ if (next)
+ refcount_inc(&next->refs);
+
+ return next;
+}
+
static struct extent_map *prev_extent_map(struct extent_map *em)
{
struct rb_node *prev;
diff --git a/fs/btrfs/extent_map.h b/fs/btrfs/extent_map.h
index ad311864272a..68d3f2c9ea1d 100644
--- a/fs/btrfs/extent_map.h
+++ b/fs/btrfs/extent_map.h
@@ -87,6 +87,8 @@ static inline u64 extent_map_block_end(struct extent_map *em)
void extent_map_tree_init(struct extent_map_tree *tree);
struct extent_map *lookup_extent_mapping(struct extent_map_tree *tree,
u64 start, u64 len);
+struct extent_map *btrfs_next_extent_map(const struct extent_map_tree *tree,
+ const struct extent_map *em);
int add_extent_mapping(struct extent_map_tree *tree,
struct extent_map *em, int modified);
void remove_extent_mapping(struct extent_map_tree *tree, struct extent_map *em);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 1bda59c68360..77202addead8 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -3248,40 +3248,50 @@ static bool find_delalloc_subrange(struct btrfs_inode *inode, u64 start, u64 end
*/
read_lock(&em_tree->lock);
em = lookup_extent_mapping(em_tree, start, len);
- read_unlock(&em_tree->lock);
+ if (!em) {
+ read_unlock(&em_tree->lock);
+ return (delalloc_len > 0);
+ }
/* extent_map_end() returns a non-inclusive end offset. */
- em_end = em ? extent_map_end(em) : 0;
+ em_end = extent_map_end(em);
/*
* If we have a hole/prealloc extent map, check the next one if this one
* ends before our range's end.
*/
- if (em && (em->block_start == EXTENT_MAP_HOLE ||
- test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) && em_end < end) {
+ if ((em->block_start == EXTENT_MAP_HOLE ||
+ test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) && em_end < end) {
struct extent_map *next_em;
- read_lock(&em_tree->lock);
- next_em = lookup_extent_mapping(em_tree, em_end, len - em_end);
- read_unlock(&em_tree->lock);
-
+ next_em = btrfs_next_extent_map(em_tree, em);
free_extent_map(em);
- em_end = next_em ? extent_map_end(next_em) : 0;
+
+ /*
+ * There's no next extent map or the next one starts beyond our
+ * range, return the range found in the io tree (if any).
+ */
+ if (!next_em || next_em->start > end) {
+ read_unlock(&em_tree->lock);
+ free_extent_map(next_em);
+ return (delalloc_len > 0);
+ }
+
+ em_end = extent_map_end(next_em);
em = next_em;
}
- if (em && (em->block_start == EXTENT_MAP_HOLE ||
- test_bit(EXTENT_FLAG_PREALLOC, &em->flags))) {
- free_extent_map(em);
- em = NULL;
- }
+ read_unlock(&em_tree->lock);
/*
- * No extent map or one for a hole or prealloc extent. Use the delalloc
- * range we found in the io tree if we have one.
+ * We have a hole or prealloc extent that ends at or beyond our range's
+ * end, return the range found in the io tree (if any).
*/
- if (!em)
+ if (em->block_start == EXTENT_MAP_HOLE ||
+ test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) {
+ free_extent_map(em);
return (delalloc_len > 0);
+ }
/*
* We don't have any range as EXTENT_DELALLOC in the io tree, so the
--
2.34.1
From: Brian Foster <bfoster(a)redhat.com>
commit 7cd3099f4925d7c15887d1940ebd65acd66100f5 upstream.
Per-inode ioend completion batching has a log reservation deadlock
vector between preallocated append transactions and transactions
that are acquired at completion time for other purposes (i.e.,
unwritten extent conversion or COW fork remaps). For example, if the
ioend completion workqueue task executes on a batch of ioends that
are sorted such that an append ioend sits at the tail, it's possible
for the outstanding append transaction reservation to block
allocation of transactions required to process preceding ioends in
the list.
Append ioend completion is historically the common path for on-disk
inode size updates. While file extending writes may have completed
sometime earlier, the on-disk inode size is only updated after
successful writeback completion. These transactions are preallocated
serially from writeback context to mitigate concurrency and
associated log reservation pressure across completions processed by
multi-threaded workqueue tasks.
However, now that delalloc blocks unconditionally map to unwritten
extents at physical block allocation time, size updates via append
ioends are relatively rare. This means that inode size updates most
commonly occur as part of the preexisting completion time
transaction to convert unwritten extents. As a result, there is no
longer a strong need to preallocate size update transactions.
Remove the preallocation of inode size update transactions to avoid
the ioend completion processing log reservation deadlock. Instead,
continue to send all potential size extending ioends to workqueue
context for completion and allocate the transaction from that
context. This ensures that no outstanding log reservation is owned
by the ioend completion worker task when it begins to process
ioends.
Signed-off-by: Brian Foster <bfoster(a)redhat.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Darrick J. Wong <djwong(a)kernel.org>
Reported-by: Christian Theune <ct(a)flyingcircus.io>
Link: https://lore.kernel.org/linux-xfs/CAOQ4uxjj2UqA0h4Y31NbmpHksMhVrXfXjLG4Tnz3…
Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
Acked-by: Darrick J. Wong <djwong(a)kernel.org>
---
Greg,
One more fix from v5.13 that I missed from my backports.
Thanks,
Amir.
fs/xfs/xfs_aops.c | 45 +++------------------------------------------
1 file changed, 3 insertions(+), 42 deletions(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 953de843d9c3..e341d6531e68 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -39,33 +39,6 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend)
XFS_I(ioend->io_inode)->i_d.di_size;
}
-STATIC int
-xfs_setfilesize_trans_alloc(
- struct iomap_ioend *ioend)
-{
- struct xfs_mount *mp = XFS_I(ioend->io_inode)->i_mount;
- struct xfs_trans *tp;
- int error;
-
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
- if (error)
- return error;
-
- ioend->io_private = tp;
-
- /*
- * We may pass freeze protection with a transaction. So tell lockdep
- * we released it.
- */
- __sb_writers_release(ioend->io_inode->i_sb, SB_FREEZE_FS);
- /*
- * We hand off the transaction to the completion thread now, so
- * clear the flag here.
- */
- xfs_trans_clear_context(tp);
- return 0;
-}
-
/*
* Update on-disk file size now that data has been written to disk.
*/
@@ -191,12 +164,10 @@ xfs_end_ioend(
error = xfs_reflink_end_cow(ip, offset, size);
else if (ioend->io_type == IOMAP_UNWRITTEN)
error = xfs_iomap_write_unwritten(ip, offset, size, false);
- else
- ASSERT(!xfs_ioend_is_append(ioend) || ioend->io_private);
+ if (!error && xfs_ioend_is_append(ioend))
+ error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size);
done:
- if (ioend->io_private)
- error = xfs_setfilesize_ioend(ioend, error);
iomap_finish_ioends(ioend, error);
memalloc_nofs_restore(nofs_flag);
}
@@ -246,7 +217,7 @@ xfs_end_io(
static inline bool xfs_ioend_needs_workqueue(struct iomap_ioend *ioend)
{
- return ioend->io_private ||
+ return xfs_ioend_is_append(ioend) ||
ioend->io_type == IOMAP_UNWRITTEN ||
(ioend->io_flags & IOMAP_F_SHARED);
}
@@ -259,8 +230,6 @@ xfs_end_bio(
struct xfs_inode *ip = XFS_I(ioend->io_inode);
unsigned long flags;
- ASSERT(xfs_ioend_needs_workqueue(ioend));
-
spin_lock_irqsave(&ip->i_ioend_lock, flags);
if (list_empty(&ip->i_ioend_list))
WARN_ON_ONCE(!queue_work(ip->i_mount->m_unwritten_workqueue,
@@ -510,14 +479,6 @@ xfs_prepare_ioend(
ioend->io_offset, ioend->io_size);
}
- /* Reserve log space if we might write beyond the on-disk inode size. */
- if (!status &&
- ((ioend->io_flags & IOMAP_F_SHARED) ||
- ioend->io_type != IOMAP_UNWRITTEN) &&
- xfs_ioend_is_append(ioend) &&
- !ioend->io_private)
- status = xfs_setfilesize_trans_alloc(ioend);
-
memalloc_nofs_restore(nofs_flag);
if (xfs_ioend_needs_workqueue(ioend))
--
2.34.1
commit 542a56e8eb4467ae654eefab31ff194569db39cd upstream.
The VCN firmware loading path enables the indirect SRAM mode if it's
advertised as supported. We might have some cases of FW issues that
prevents this mode to working properly though, ending-up in a failed
probe. An example below, observed in the Steam Deck:
[...]
[drm] failed to load ucode VCN0_RAM(0x3A)
[drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000)
amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110)
[drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <vcn_v3_0> failed -110
amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_init failed
amdgpu 0000:04:00.0: amdgpu: Fatal error during GPU init
[...]
Disabling the VCN block circumvents this, but it's a very invasive
workaround that turns off the entire feature. So, let's add a quirk
on VCN loading that checks for known problematic BIOSes on Vangogh,
so we can proactively disable the indirect SRAM mode and allow the
HW proper probe and VCN IP block to work fine.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385
Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
Fixes: 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
Cc: stable(a)vger.kernel.org
Cc: James Zhu <James.Zhu(a)amd.com>
Cc: Leo Liu <leo.liu(a)amd.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
---
Hi folks, this was build/boot tested on Deck. I've also adjusted the
context, function was reworked on 6.2.
But what a surprise was for me not see this fix already in 6.1.y, since
I've CCed stable, and the reason for that is really peculiar:
$ git log -1 --pretty="%an <%ae>: %s" 82132ecc5432
Leo Liu <leo.liu(a)amd.com>: drm/amdgpu: enable Vangogh VCN indirect sram mode
$ git describe --contains 82132ecc5432
v6.2-rc1~124^2~1^2~13
$ git log -1 --pretty="%an <%ae>: %s" 9a8cc8cabc1e
Leo Liu <leo.liu(a)amd.com>: drm/amdgpu: enable Vangogh VCN indirect sram mode
$ git describe --contains 9a8cc8cabc1e
v6.1-rc8~16^2^2
This is quite strange for me, we have 2 commit hashes pointing to the *same*
commit, and each one is present..in a different release !!?!
Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y stable
misses it, since it only contains 9a8cc8cabc1e (which is the same patch!).
Alex, do you have an idea why sometimes commits from the AMD tree appear
duplicate in mainline? Specially in different releases, this could cause
some confusion I guess.
Thanks in advance,
Guilherme
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index ce64ca1c6e66..5c1193dd7d88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -26,6 +26,7 @@
#include <linux/firmware.h>
#include <linux/module.h>
+#include <linux/dmi.h>
#include <linux/pci.h>
#include <linux/debugfs.h>
#include <drm/drm_drv.h>
@@ -84,6 +85,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
{
unsigned long bo_size;
const char *fw_name;
+ const char *bios_ver;
const struct common_firmware_header *hdr;
unsigned char fw_check;
unsigned int fw_shared_size, log_offset;
@@ -159,6 +161,21 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
(adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
adev->vcn.indirect_sram = true;
+ /*
+ * Some Steam Deck's BIOS versions are incompatible with the
+ * indirect SRAM mode, leading to amdgpu being unable to get
+ * properly probed (and even potentially crashing the kernel).
+ * Hence, check for these versions here - notice this is
+ * restricted to Vangogh (Deck's APU).
+ */
+ bios_ver = dmi_get_system_info(DMI_BIOS_VERSION);
+
+ if (bios_ver && (!strncmp("F7A0113", bios_ver, 7) ||
+ !strncmp("F7A0114", bios_ver, 7))) {
+ adev->vcn.indirect_sram = false;
+ dev_info(adev->dev,
+ "Steam Deck quirk: indirect SRAM disabled on BIOS %s\n", bios_ver);
+ }
break;
case IP_VERSION(3, 0, 16):
fw_name = FIRMWARE_DIMGREY_CAVEFISH;
--
2.40.0
From: Mel Gorman <mgorman(a)techsingularity.net>
commit 1c0908d8e441631f5b8ba433523cf39339ee2ba0 upstream.
Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench
on XFS on arm64.
kernel BUG at fs/inode.c:625!
Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
CPU: 11 PID: 6611 Comm: dbench Tainted: G E 6.0.0-rt14-rt+ #1
pc : clear_inode+0xa0/0xc0
lr : clear_inode+0x38/0xc0
Call trace:
clear_inode+0xa0/0xc0
evict+0x160/0x180
iput+0x154/0x240
do_unlinkat+0x184/0x300
__arm64_sys_unlinkat+0x48/0xc0
el0_svc_common.constprop.4+0xe4/0x2c0
do_el0_svc+0xac/0x100
el0_svc+0x78/0x200
el0t_64_sync_handler+0x9c/0xc0
el0t_64_sync+0x19c/0x1a0
It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this
is likely a bug that existed forever and only became visible when ARM
support was added to preempt-rt. The same problem does not occur on x86-64
and he also reported that converting sb->s_inode_wblist_lock to
raw_spinlock_t makes the problem disappear indicating that the RT spinlock
variant is the problem.
Which in turn means that RT mutexes on ARM64 and any other weakly ordered
architecture are affected by this independent of RT.
Will Deacon observed:
"I'd be more inclined to be suspicious of the slowpath tbh, as we need to
make sure that we have acquire semantics on all paths where the lock can
be taken. Looking at the rtmutex code, this really isn't obvious to me
-- for example, try_to_take_rt_mutex() appears to be able to return via
the 'takeit' label without acquire semantics and it looks like we might
be relying on the caller's subsequent _unlock_ of the wait_lock for
ordering, but that will give us release semantics which aren't correct."
Sebastian Andrzej Siewior prototyped a fix that does work based on that
comment but it was a little bit overkill and added some fences that should
not be necessary.
The lock owner is updated with an IRQ-safe raw spinlock held, but the
spin_unlock does not provide acquire semantics which are needed when
acquiring a mutex.
Adds the necessary acquire semantics for lock owner updates in the slow path
acquisition and the waiter bit logic.
It successfully completed 10 iterations of the dbench workload while the
vanilla kernel fails on the first iteration.
[ bigeasy(a)linutronix.de: Initial prototype fix ]
Fixes: 700318d1d7b38 ("locking/rtmutex: Use acquire/release semantics")
Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Reported-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.n…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
---
Could this be please backported to 5.15 and earlier? It is already part
of the 6.X kernels. I asked about this by the end of January and I'm
kindly asking again ;)
This patch applies against v5.15. Should it not apply to earlier
versions, please let me know an I kindly provide a backport.
I received reports that this fixes "mysterious" crashes and that is how
I noticed that it is not part of the earlier kernels.
kernel/locking/rtmutex.c | 55 ++++++++++++++++++++++++++++++------
kernel/locking/rtmutex_api.c | 6 ++--
2 files changed, 49 insertions(+), 12 deletions(-)
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index ea5a701ab2408..c9b21fd30bed5 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -87,15 +87,31 @@ static inline int __ww_mutex_check_kill(struct rt_mutex *lock,
* set this bit before looking at the lock.
*/
-static __always_inline void
-rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner)
+static __always_inline struct task_struct *
+rt_mutex_owner_encode(struct rt_mutex_base *lock, struct task_struct *owner)
{
unsigned long val = (unsigned long)owner;
if (rt_mutex_has_waiters(lock))
val |= RT_MUTEX_HAS_WAITERS;
- WRITE_ONCE(lock->owner, (struct task_struct *)val);
+ return (struct task_struct *)val;
+}
+
+static __always_inline void
+rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner)
+{
+ /*
+ * lock->wait_lock is held but explicit acquire semantics are needed
+ * for a new lock owner so WRITE_ONCE is insufficient.
+ */
+ xchg_acquire(&lock->owner, rt_mutex_owner_encode(lock, owner));
+}
+
+static __always_inline void rt_mutex_clear_owner(struct rt_mutex_base *lock)
+{
+ /* lock->wait_lock is held so the unlock provides release semantics. */
+ WRITE_ONCE(lock->owner, rt_mutex_owner_encode(lock, NULL));
}
static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock)
@@ -104,7 +120,8 @@ static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock)
((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS);
}
-static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock)
+static __always_inline void
+fixup_rt_mutex_waiters(struct rt_mutex_base *lock, bool acquire_lock)
{
unsigned long owner, *p = (unsigned long *) &lock->owner;
@@ -170,8 +187,21 @@ static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock)
* still set.
*/
owner = READ_ONCE(*p);
- if (owner & RT_MUTEX_HAS_WAITERS)
- WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
+ if (owner & RT_MUTEX_HAS_WAITERS) {
+ /*
+ * See rt_mutex_set_owner() and rt_mutex_clear_owner() on
+ * why xchg_acquire() is used for updating owner for
+ * locking and WRITE_ONCE() for unlocking.
+ *
+ * WRITE_ONCE() would work for the acquire case too, but
+ * in case that the lock acquisition failed it might
+ * force other lockers into the slow path unnecessarily.
+ */
+ if (acquire_lock)
+ xchg_acquire(p, owner & ~RT_MUTEX_HAS_WAITERS);
+ else
+ WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
+ }
}
/*
@@ -206,6 +236,13 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock)
owner = *p;
} while (cmpxchg_relaxed(p, owner,
owner | RT_MUTEX_HAS_WAITERS) != owner);
+
+ /*
+ * The cmpxchg loop above is relaxed to avoid back-to-back ACQUIRE
+ * operations in the event of contention. Ensure the successful
+ * cmpxchg is visible.
+ */
+ smp_mb__after_atomic();
}
/*
@@ -1231,7 +1268,7 @@ static int __sched __rt_mutex_slowtrylock(struct rt_mutex_base *lock)
* try_to_take_rt_mutex() sets the lock waiters bit
* unconditionally. Clean this up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
return ret;
}
@@ -1591,7 +1628,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
* try_to_take_rt_mutex() sets the waiter bit
* unconditionally. We might have to fix that up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
return ret;
}
@@ -1701,7 +1738,7 @@ static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock)
* try_to_take_rt_mutex() sets the waiter bit unconditionally.
* We might have to fix that up:
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
debug_rt_mutex_free_waiter(&waiter);
}
diff --git a/kernel/locking/rtmutex_api.c b/kernel/locking/rtmutex_api.c
index 5c9299aaabae1..a461be2f873db 100644
--- a/kernel/locking/rtmutex_api.c
+++ b/kernel/locking/rtmutex_api.c
@@ -245,7 +245,7 @@ void __sched rt_mutex_init_proxy_locked(struct rt_mutex_base *lock,
void __sched rt_mutex_proxy_unlock(struct rt_mutex_base *lock)
{
debug_rt_mutex_proxy_unlock(lock);
- rt_mutex_set_owner(lock, NULL);
+ rt_mutex_clear_owner(lock);
}
/**
@@ -360,7 +360,7 @@ int __sched rt_mutex_wait_proxy_lock(struct rt_mutex_base *lock,
* try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
* have to fix that up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
raw_spin_unlock_irq(&lock->wait_lock);
return ret;
@@ -416,7 +416,7 @@ bool __sched rt_mutex_cleanup_proxy_lock(struct rt_mutex_base *lock,
* try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
* have to fix that up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, false);
raw_spin_unlock_irq(&lock->wait_lock);
--
2.39.1
In upstream commit 77e52ae35463 ("futex: Move to kernel/futex/") the
futex code from kernel/futex.c was moved into kernel/futex/core.c in
preparation of the split-up of the implementation in various files.
Point kernel-doc references to the new files as otherwise the
documentation shows errors on build:
[...]
Error: Cannot open file ./kernel/futex.c
Error: Cannot open file ./kernel/futex.c
[...]
WARNING: kernel-doc './scripts/kernel-doc -rst -enable-lineno -sphinx-version 3.4.3 -internal ./kernel/futex.c' failed with return code 2
There is no direct upstream commit for this change. It is made in
analogy to commit bc67f1c454fb ("docs: futex: Fix kernel-doc
references") applied as consequence of the restructuring of the futex
code.
Fixes: 77e52ae35463 ("futex: Move to kernel/futex/")
Signed-off-by: Salvatore Bonaccorso <carnil(a)debian.org>
---
v1->v2:
- Fix typo in description about new target file for futex.c code
- Indent block with build log output
Documentation/kernel-hacking/locking.rst | 2 +-
Documentation/translations/it_IT/kernel-hacking/locking.rst | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/kernel-hacking/locking.rst b/Documentation/kernel-hacking/locking.rst
index 6ed806e6061b..a6d89efede79 100644
--- a/Documentation/kernel-hacking/locking.rst
+++ b/Documentation/kernel-hacking/locking.rst
@@ -1358,7 +1358,7 @@ Mutex API reference
Futex API reference
===================
-.. kernel-doc:: kernel/futex.c
+.. kernel-doc:: kernel/futex/core.c
:internal:
Further reading
diff --git a/Documentation/translations/it_IT/kernel-hacking/locking.rst b/Documentation/translations/it_IT/kernel-hacking/locking.rst
index bf1acd6204ef..192ab8e28125 100644
--- a/Documentation/translations/it_IT/kernel-hacking/locking.rst
+++ b/Documentation/translations/it_IT/kernel-hacking/locking.rst
@@ -1400,7 +1400,7 @@ Riferimento per l'API dei Mutex
Riferimento per l'API dei Futex
===============================
-.. kernel-doc:: kernel/futex.c
+.. kernel-doc:: kernel/futex/core.c
:internal:
Approfondimenti
--
2.40.0
From: Huanhuan Wang <huanhuan.wang(a)corigine.com>
There are two pointers in struct xfrm_dev_offload, *dev, *real_dev.
The *dev points whether bonding interface or real interface, if
bonding IPsec offload is used, it points bonding interface; if not,
it points real interface. And *real_dev always points real interface.
So nfp should always use real_dev instead of dev.
Prior to this change the system becomes unresponsive when offloading
IPsec for a device which is a lower device to a bonding device.
Fixes: 859a497fe80c ("nfp: implement xfrm callbacks and expose ipsec offload feature to upper layer")
CC: stable(a)vger.kernel.org
Signed-off-by: Huanhuan Wang <huanhuan.wang(a)corigine.com>
Acked-by: Simon Horman <simon.horman(a)corigine.com>
Signed-off-by: Louis Peens <louis.peens(a)corigine.com>
---
drivers/net/ethernet/netronome/nfp/crypto/ipsec.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c b/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c
index c0dcce8ae437..b1f026b81dea 100644
--- a/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c
+++ b/drivers/net/ethernet/netronome/nfp/crypto/ipsec.c
@@ -269,7 +269,7 @@ static void set_sha2_512hmac(struct nfp_ipsec_cfg_add_sa *cfg, int *trunc_len)
static int nfp_net_xfrm_add_state(struct xfrm_state *x,
struct netlink_ext_ack *extack)
{
- struct net_device *netdev = x->xso.dev;
+ struct net_device *netdev = x->xso.real_dev;
struct nfp_ipsec_cfg_mssg msg = {};
int i, key_len, trunc_len, err = 0;
struct nfp_ipsec_cfg_add_sa *cfg;
@@ -513,7 +513,7 @@ static void nfp_net_xfrm_del_state(struct xfrm_state *x)
.cmd = NFP_IPSEC_CFG_MSSG_INV_SA,
.sa_idx = x->xso.offload_handle - 1,
};
- struct net_device *netdev = x->xso.dev;
+ struct net_device *netdev = x->xso.real_dev;
struct nfp_net *nn;
int err;
--
2.34.1
In upstream commit 77e52ae35463 ("futex: Move to kernel/futex/") the
futex code from kernel/futex.c was moved into kernel/futex/core in
preparation of the split-up of the implementation in various files.
Point kernel-doc references to the new files as otherwise the
documentation shows errors on build:
[...]
Error: Cannot open file ./kernel/futex.c
Error: Cannot open file ./kernel/futex.c
[...]
WARNING: kernel-doc './scripts/kernel-doc -rst -enable-lineno -sphinx-version 3.4.3 -internal ./kernel/futex.c' failed with return code 2
There is no direct upstream commit for this change. It is made in
analogy to commit bc67f1c454fb ("docs: futex: Fix kernel-doc
references") applied as consequence of the restructuring of the futex
code.
Fixes: 77e52ae35463 ("futex: Move to kernel/futex/")
Signed-off-by: Salvatore Bonaccorso <carnil(a)debian.org>
---
Documentation/kernel-hacking/locking.rst | 2 +-
Documentation/translations/it_IT/kernel-hacking/locking.rst | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/kernel-hacking/locking.rst b/Documentation/kernel-hacking/locking.rst
index 6ed806e6061b..a6d89efede79 100644
--- a/Documentation/kernel-hacking/locking.rst
+++ b/Documentation/kernel-hacking/locking.rst
@@ -1358,7 +1358,7 @@ Mutex API reference
Futex API reference
===================
-.. kernel-doc:: kernel/futex.c
+.. kernel-doc:: kernel/futex/core.c
:internal:
Further reading
diff --git a/Documentation/translations/it_IT/kernel-hacking/locking.rst b/Documentation/translations/it_IT/kernel-hacking/locking.rst
index bf1acd6204ef..192ab8e28125 100644
--- a/Documentation/translations/it_IT/kernel-hacking/locking.rst
+++ b/Documentation/translations/it_IT/kernel-hacking/locking.rst
@@ -1400,7 +1400,7 @@ Riferimento per l'API dei Mutex
Riferimento per l'API dei Futex
===============================
-.. kernel-doc:: kernel/futex.c
+.. kernel-doc:: kernel/futex/core.c
:internal:
Approfondimenti
--
2.40.0
From: Pingfan Liu <kernelfans(a)gmail.com>
Purgatory.ro is a standalone binary that is not linked against the rest of
the kernel. Its image is copied into an array that is linked to the
kernel, and from there kexec relocates it wherever it desires.
Unlike the debug info for vmlinux, which can be used for analyzing crash
such info is useless in purgatory.ro. And discarding them can save about
200K space.
Original:
259080 kexec-purgatory.o
Stripped debug info:
29152 kexec-purgatory.o
Signed-off-by: Pingfan Liu <kernelfans(a)gmail.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers(a)google.com>
Reviewed-by: Steve Wahl <steve.wahl(a)hpe.com>
Acked-by: Dave Young <dyoung(a)redhat.com>
Link: https://lore.kernel.org/r/1596433788-3784-1-git-send-email-kernelfans@gmail…
(cherry picked from commit 52416ffcf823ee11aa19792715664ab94757f111)
[Alyssa: fixed for LLVM_IAS=1 by adding -g to AFLAGS_REMOVE_*]
Signed-off-by: Alyssa Ross <hi(a)alyssa.is>
---
arch/x86/purgatory/Makefile | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 9733d1cc791d..969d2b2eb7d7 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -27,7 +27,7 @@ KCOV_INSTRUMENT := n
# make up the standalone purgatory.ro
PURGATORY_CFLAGS_REMOVE := -mcmodel=kernel
-PURGATORY_CFLAGS := -mcmodel=large -ffreestanding -fno-zero-initialized-in-bss
+PURGATORY_CFLAGS := -mcmodel=large -ffreestanding -fno-zero-initialized-in-bss -g0
PURGATORY_CFLAGS += $(DISABLE_STACKLEAK_PLUGIN) -DDISABLE_BRANCH_PROFILING
# Default KBUILD_CFLAGS can have -pg option set when FTRACE is enabled. That
@@ -58,6 +58,9 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_string.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_string.o += $(PURGATORY_CFLAGS)
+AFLAGS_REMOVE_setup-x86_$(BITS).o += -g -Wa,-gdwarf-2
+AFLAGS_REMOVE_entry64.o += -g -Wa,-gdwarf-2
+
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
--
2.37.1
This is based on upstream commit
d83806c4c0cc ("purgatory: fix disabling debug info"), but adapted to
the linker flags used in 5.10.y.
Since 3a260e9844c9, the linker flags can contain -g instead of
-Wa,-gdwarf-2 (when using the LLVM assembler). As a result, in that
case, debug info was being generated for the purgatory objects, even
though the intention was that it not be.
Fixes: 3a260e9844c9 ("Makefile.debug: re-enable debug info for .S files")
Signed-off-by: Alyssa Ross <hi(a)alyssa.is>
Cc: stable(a)vger.kernel.org
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
---
arch/x86/purgatory/Makefile | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20c..ebaf329a2368 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -64,8 +64,7 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_string.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_string.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_setup-x86_$(BITS).o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2
+asflags-remove-y += -g -Wa,-gdwarf-2
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
--
2.37.1
This is based on upstream commit
d83806c4c0cc ("purgatory: fix disabling debug info"), but adapted to
the linker flags used in 5.15.y.
Since 0ee2f0567a56, the linker flags can contain -g instead of
-Wa,-gdwarf-2 (when using the LLVM assembler). As a result, in that
case, debug info was being generated for the purgatory objects, even
though the intention was that it not be.
Fixes: 0ee2f0567a56 ("Makefile.debug: re-enable debug info for .S files")
Signed-off-by: Alyssa Ross <hi(a)alyssa.is>
Cc: stable(a)vger.kernel.org
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
---
arch/x86/purgatory/Makefile | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 95ea17a9d20c..ebaf329a2368 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -64,8 +64,7 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_string.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_string.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_setup-x86_$(BITS).o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2
+asflags-remove-y += -g -Wa,-gdwarf-2
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
--
2.37.1
Since 32ef9e5054ec, -Wa,-gdwarf-2 is no longer used in KBUILD_AFLAGS.
Instead, it includes -g, the appropriate -gdwarf-* flag, and also the
-Wa versions of both of those if building with Clang and GNU as. As a
result, debug info was being generated for the purgatory objects, even
though the intention was that it not be.
Fixes: 32ef9e5054ec ("Makefile.debug: re-enable debug info for .S files")
Signed-off-by: Alyssa Ross <hi(a)alyssa.is>
Cc: stable(a)vger.kernel.org
Acked-by: Nick Desaulniers <ndesaulniers(a)google.com>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
(cherry picked from commit d83806c4c0cccc0d6d3c3581a11983a9c186a138)
---
arch/riscv/purgatory/Makefile | 4 +---
arch/x86/purgatory/Makefile | 3 +--
2 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
index dd58e1d99397..659e21862077 100644
--- a/arch/riscv/purgatory/Makefile
+++ b/arch/riscv/purgatory/Makefile
@@ -74,9 +74,7 @@ CFLAGS_string.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_ctype.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_ctype.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_entry.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_memcpy.o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_memset.o += -Wa,-gdwarf-2
+asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x))
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 17f09dc26381..82fec66d46d2 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -69,8 +69,7 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS)
CFLAGS_REMOVE_string.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_string.o += $(PURGATORY_CFLAGS)
-AFLAGS_REMOVE_setup-x86_$(BITS).o += -Wa,-gdwarf-2
-AFLAGS_REMOVE_entry64.o += -Wa,-gdwarf-2
+asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x))
$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
base-commit: cdc7aff9ed012801e62eedd99e4a5573eccac4db
--
2.37.1
The quilt patch titled
Subject: ia64: fix an addr to taddr in huge_pte_offset()
has been removed from the -mm tree. Its filename was
ia64-fix-an-addr-to-taddr-in-huge_pte_offset.patch
This patch was dropped because it was merged into the mm-nonmm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: ia64: fix an addr to taddr in huge_pte_offset()
Date: Sun, 16 Apr 2023 22:17:05 -0700 (PDT)
I know nothing of ia64 htlbpage_to_page(), but guess that the p4d
line should be using taddr rather than addr, like everywhere else.
Link: https://lkml.kernel.org/r/732eae88-3beb-246-2c72-281de786740@google.com
Fixes: c03ab9e32a2c ("ia64: add support for folded p4d page tables")
Signed-off-by: Hugh Dickins <hughd(a)google.com
Acked-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Mike Rapoport (IBM) <rppt(a)kernel.org>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/ia64/mm/hugetlbpage.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/ia64/mm/hugetlbpage.c~ia64-fix-an-addr-to-taddr-in-huge_pte_offset
+++ a/arch/ia64/mm/hugetlbpage.c
@@ -58,7 +58,7 @@ huge_pte_offset (struct mm_struct *mm, u
pgd = pgd_offset(mm, taddr);
if (pgd_present(*pgd)) {
- p4d = p4d_offset(pgd, addr);
+ p4d = p4d_offset(pgd, taddr);
if (p4d_present(*p4d)) {
pud = pud_offset(p4d, taddr);
if (pud_present(*pud)) {
_
Patches currently in -mm which might be from hughd(a)google.com are
The quilt patch titled
Subject: mm/hugetlb: fix uffd-wp during fork()
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-uffd-wp-during-fork.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/hugetlb: fix uffd-wp during fork()
Date: Mon, 17 Apr 2023 15:53:12 -0400
Patch series "mm/hugetlb: More fixes around uffd-wp vs fork() / RO pins",
v2.
This patch (of 6):
There're a bunch of things that were wrong:
- Reading uffd-wp bit from a swap entry should use pte_swp_uffd_wp()
rather than huge_pte_uffd_wp().
- When copying over a pte, we should drop uffd-wp bit when
!EVENT_FORK (aka, when !userfaultfd_wp(dst_vma)).
- When doing early CoW for private hugetlb (e.g. when the parent page was
pinned), uffd-wp bit should be properly carried over if necessary.
No bug reported probably because most people do not even care about these
corner cases, but they are still bugs and can be exposed by the recent unit
tests introduced, so fix all of them in one shot.
Link: https://lkml.kernel.org/r/20230417195317.898696-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230417195317.898696-2-peterx@redhat.com
Fixes: bc70fbf269fd ("mm/hugetlb: handle uffd-wp during fork()")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Mika Penttil�� <mpenttil(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-uffd-wp-during-fork
+++ a/mm/hugetlb.c
@@ -4953,11 +4953,15 @@ static bool is_hugetlb_entry_hwpoisoned(
static void
hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr,
- struct folio *new_folio)
+ struct folio *new_folio, pte_t old)
{
+ pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);
+
__folio_mark_uptodate(new_folio);
hugepage_add_new_anon_rmap(new_folio, vma, addr);
- set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1));
+ if (userfaultfd_wp(vma) && huge_pte_uffd_wp(old))
+ newpte = huge_pte_mkuffd_wp(newpte);
+ set_huge_pte_at(vma->vm_mm, addr, ptep, newpte);
hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm);
folio_set_hugetlb_migratable(new_folio);
}
@@ -5032,14 +5036,12 @@ again:
*/
;
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
- bool uffd_wp = huge_pte_uffd_wp(entry);
-
- if (!userfaultfd_wp(dst_vma) && uffd_wp)
+ if (!userfaultfd_wp(dst_vma))
entry = huge_pte_clear_uffd_wp(entry);
set_huge_pte_at(dst, addr, dst_pte, entry);
} else if (unlikely(is_hugetlb_entry_migration(entry))) {
swp_entry_t swp_entry = pte_to_swp_entry(entry);
- bool uffd_wp = huge_pte_uffd_wp(entry);
+ bool uffd_wp = pte_swp_uffd_wp(entry);
if (!is_readable_migration_entry(swp_entry) && cow) {
/*
@@ -5050,10 +5052,10 @@ again:
swp_offset(swp_entry));
entry = swp_entry_to_pte(swp_entry);
if (userfaultfd_wp(src_vma) && uffd_wp)
- entry = huge_pte_mkuffd_wp(entry);
+ entry = pte_swp_mkuffd_wp(entry);
set_huge_pte_at(src, addr, src_pte, entry);
}
- if (!userfaultfd_wp(dst_vma) && uffd_wp)
+ if (!userfaultfd_wp(dst_vma))
entry = huge_pte_clear_uffd_wp(entry);
set_huge_pte_at(dst, addr, dst_pte, entry);
} else if (unlikely(is_pte_marker(entry))) {
@@ -5118,7 +5120,8 @@ again:
/* huge_ptep of dst_pte won't change as in child */
goto again;
}
- hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio);
+ hugetlb_install_folio(dst_vma, dst_pte, addr,
+ new_folio, src_pte_old);
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
continue;
@@ -5136,6 +5139,9 @@ again:
entry = huge_pte_wrprotect(entry);
}
+ if (!userfaultfd_wp(dst_vma))
+ entry = huge_pte_clear_uffd_wp(entry);
+
set_huge_pte_at(dst, addr, dst_pte, entry);
hugetlb_count_add(npages, dst);
}
_
Patches currently in -mm which might be from peterx(a)redhat.com are
On some Zhaoxin platforms, xHCI will prefetch TRB for performance
improvement. However this TRB prefetch mechanism may cross page boundary,
which may access memory not allocated by xHCI driver. In order to fix
this issue, two pages was allocated for TRB and only the first
page will be used.
Cc: stable(a)vger.kernel.org
Signed-off-by: Weitao Wang <WeitaoWang-oc(a)zhaoxin.com>
---
drivers/usb/host/xhci-mem.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index d0a9467aa5fc..d5517400d874 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -2369,8 +2369,12 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
* and our use of dma addresses in the trb_address_map radix tree needs
* TRB_SEGMENT_SIZE alignment, so we pick the greater alignment need.
*/
- xhci->segment_pool = dma_pool_create("xHCI ring segments", dev,
- TRB_SEGMENT_SIZE, TRB_SEGMENT_SIZE, xhci->page_size);
+ if (xhci->quirks & XHCI_ZHAOXIN_TRB_FETCH)
+ xhci->segment_pool = dma_pool_create("xHCI ring segments", dev,
+ TRB_SEGMENT_SIZE * 2, TRB_SEGMENT_SIZE * 2, xhci->page_size * 2);
+ else
+ xhci->segment_pool = dma_pool_create("xHCI ring segments", dev,
+ TRB_SEGMENT_SIZE, TRB_SEGMENT_SIZE, xhci->page_size);
/* See Table 46 and Note on Figure 55 */
xhci->device_pool = dma_pool_create("xHCI input/output contexts", dev,
--
2.32.0
commit 08d0cc5f34265d1a1e3031f319f594bd1970976c upstream.
This change is desired because without it, it has been observed that
re-applying aspm settings can cause the system to crash with certain pci
devices (ie. Genesys GL9755).
Tested by issuing 100 suspend/resume cycles on a symptomatic system running
5.15.107.
L1 settings looked identical before and after:
```
localhost ~ # lspci -vvv -d 0x17a0: | grep L1Sub
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+
L1SubCtl2: T_PwrOn=3100us
```
Cc: <stable(a)vger.kernel.org> # 5.15.y
OverCurrent condition is not standardized in the UHCI spec.
Zhaoxin UHCI controllers report OverCurrent bit active off.
In order to handle OverCurrent condition correctly, the uhci-hcd
driver needs to be told to expect the active-off behavior.
Suggested-by: Alan Stern <stern(a)rowland.harvard.edu>
Cc: stable(a)vger.kernel.org
Signed-off-by: Weitao Wang <WeitaoWang-oc(a)zhaoxin.com>
---
v1->v2
- Modify the description of this patch.
- Let Zhaoxin and VIA share a common oc_low flag
drivers/usb/host/uhci-pci.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/host/uhci-pci.c b/drivers/usb/host/uhci-pci.c
index 3592f757fe05..034586911bb5 100644
--- a/drivers/usb/host/uhci-pci.c
+++ b/drivers/usb/host/uhci-pci.c
@@ -119,11 +119,12 @@ static int uhci_pci_init(struct usb_hcd *hcd)
uhci->rh_numports = uhci_count_ports(hcd);
- /* Intel controllers report the OverCurrent bit active on.
- * VIA controllers report it active off, so we'll adjust the
- * bit value. (It's not standardized in the UHCI spec.)
+ /* Intel controllers report the OverCurrent bit active on. VIA
+ * and ZHAOXIN controllers report it active off, so we'll adjust
+ * the bit value. (It's not standardized in the UHCI spec.)
*/
- if (to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_VIA)
+ if (to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_VIA ||
+ to_pci_dev(uhci_dev(uhci))->vendor == PCI_VENDOR_ID_ZHAOXIN)
uhci->oc_low = 1;
/* HP's server management chip requires a longer port reset delay. */
--
2.32.0
Hi all,
After merging the tip tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:
/tmp/next/build/kernel/time/posix-cpu-timers.c: In function 'posix_cpu_timer_wait_running_nsleep':
/tmp/next/build/kernel/time/posix-cpu-timers.c:1310:30: error: 'timr' is a pointer; did you mean to use '->'?
1310 | spin_unlock_irq(&timr.it_lock);
| ^
| ->
/tmp/next/build/kernel/time/posix-cpu-timers.c:1312:28: error: 'timr' is a pointer; did you mean to use '->'?
1312 | spin_lock_irq(&timr.it_lock);
| ^
| ->
Caused by commit
2aaae4bf41b101f7e ("posix-cpu-timers: Implement the missing timer_wait_running callback")
The !POSIX_CPU_TIMERS_TASK_WORK case wasn't fully updated. I've used
the version of the tip tree from next-20230420 instead.
The following commit has been merged into the timers/core branch of tip:
Commit-ID: f7abf14f0001a5a47539d9f60bbdca649e43536b
Gitweb: https://git.kernel.org/tip/f7abf14f0001a5a47539d9f60bbdca649e43536b
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Mon, 17 Apr 2023 15:37:55 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Fri, 21 Apr 2023 15:34:33 +02:00
posix-cpu-timers: Implement the missing timer_wait_running callback
For some unknown reason the introduction of the timer_wait_running callback
missed to fixup posix CPU timers, which went unnoticed for almost four years.
Marco reported recently that the WARN_ON() in timer_wait_running()
triggers with a posix CPU timer test case.
Posix CPU timers have two execution models for expiring timers depending on
CONFIG_POSIX_CPU_TIMERS_TASK_WORK:
1) If not enabled, the expiry happens in hard interrupt context so
spin waiting on the remote CPU is reasonably time bound.
Implement an empty stub function for that case.
2) If enabled, the expiry happens in task work before returning to user
space or guest mode. The expired timers are marked as firing and moved
from the timer queue to a local list head with sighand lock held. Once
the timers are moved, sighand lock is dropped and the expiry happens in
fully preemptible context. That means the expiring task can be scheduled
out, migrated, interrupted etc. So spin waiting on it is more than
suboptimal.
The timer wheel has a timer_wait_running() mechanism for RT, which uses
a per CPU timer-base expiry lock which is held by the expiry code and the
task waiting for the timer function to complete blocks on that lock.
This does not work in the same way for posix CPU timers as there is no
timer base and expiry for process wide timers can run on any task
belonging to that process, but the concept of waiting on an expiry lock
can be used too in a slightly different way:
- Add a mutex to struct posix_cputimers_work. This struct is per task
and used to schedule the expiry task work from the timer interrupt.
- Add a task_struct pointer to struct cpu_timer which is used to store
a the task which runs the expiry. That's filled in when the task
moves the expired timers to the local expiry list. That's not
affecting the size of the k_itimer union as there are bigger union
members already
- Let the task take the expiry mutex around the expiry function
- Let the waiter acquire a task reference with rcu_read_lock() held and
block on the expiry mutex
This avoids spin-waiting on a task which might not even be on a CPU and
works nicely for RT too.
Fixes: ec8f954a40da ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT")
Reported-by: Marco Elver <elver(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Marco Elver <elver(a)google.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87zg764ojw.ffs@tglx
---
include/linux/posix-timers.h | 17 ++++---
kernel/time/posix-cpu-timers.c | 81 +++++++++++++++++++++++++++------
kernel/time/posix-timers.c | 4 ++-
3 files changed, 82 insertions(+), 20 deletions(-)
diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 2c6e99c..d607f51 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -4,6 +4,7 @@
#include <linux/spinlock.h>
#include <linux/list.h>
+#include <linux/mutex.h>
#include <linux/alarmtimer.h>
#include <linux/timerqueue.h>
@@ -62,16 +63,18 @@ static inline int clockid_to_fd(const clockid_t clk)
* cpu_timer - Posix CPU timer representation for k_itimer
* @node: timerqueue node to queue in the task/sig
* @head: timerqueue head on which this timer is queued
- * @task: Pointer to target task
+ * @pid: Pointer to target task PID
* @elist: List head for the expiry list
* @firing: Timer is currently firing
+ * @handling: Pointer to the task which handles expiry
*/
struct cpu_timer {
- struct timerqueue_node node;
- struct timerqueue_head *head;
- struct pid *pid;
- struct list_head elist;
- int firing;
+ struct timerqueue_node node;
+ struct timerqueue_head *head;
+ struct pid *pid;
+ struct list_head elist;
+ int firing;
+ struct task_struct __rcu *handling;
};
static inline bool cpu_timer_enqueue(struct timerqueue_head *head,
@@ -135,10 +138,12 @@ struct posix_cputimers {
/**
* posix_cputimers_work - Container for task work based posix CPU timer expiry
* @work: The task work to be scheduled
+ * @mutex: Mutex held around expiry in context of this task work
* @scheduled: @work has been scheduled already, no further processing
*/
struct posix_cputimers_work {
struct callback_head work;
+ struct mutex mutex;
unsigned int scheduled;
};
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 2f5e9b3..e9c6f9d 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -846,6 +846,8 @@ static u64 collect_timerqueue(struct timerqueue_head *head,
return expires;
ctmr->firing = 1;
+ /* See posix_cpu_timer_wait_running() */
+ rcu_assign_pointer(ctmr->handling, current);
cpu_timer_dequeue(ctmr);
list_add_tail(&ctmr->elist, firing);
}
@@ -1161,7 +1163,49 @@ static void handle_posix_cpu_timers(struct task_struct *tsk);
#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
static void posix_cpu_timers_work(struct callback_head *work)
{
+ struct posix_cputimers_work *cw = container_of(work, typeof(*cw), work);
+
+ mutex_lock(&cw->mutex);
handle_posix_cpu_timers(current);
+ mutex_unlock(&cw->mutex);
+}
+
+/*
+ * Invoked from the posix-timer core when a cancel operation failed because
+ * the timer is marked firing. The caller holds rcu_read_lock(), which
+ * protects the timer and the task which is expiring it from being freed.
+ */
+static void posix_cpu_timer_wait_running(struct k_itimer *timr)
+{
+ struct task_struct *tsk = rcu_dereference(timr->it.cpu.handling);
+
+ /* Has the handling task completed expiry already? */
+ if (!tsk)
+ return;
+
+ /* Ensure that the task cannot go away */
+ get_task_struct(tsk);
+ /* Now drop the RCU protection so the mutex can be locked */
+ rcu_read_unlock();
+ /* Wait on the expiry mutex */
+ mutex_lock(&tsk->posix_cputimers_work.mutex);
+ /* Release it immediately again. */
+ mutex_unlock(&tsk->posix_cputimers_work.mutex);
+ /* Drop the task reference. */
+ put_task_struct(tsk);
+ /* Relock RCU so the callsite is balanced */
+ rcu_read_lock();
+}
+
+static void posix_cpu_timer_wait_running_nsleep(struct k_itimer *timr)
+{
+ /* Ensure that timr->it.cpu.handling task cannot go away */
+ rcu_read_lock();
+ spin_unlock_irq(&timr->it_lock);
+ posix_cpu_timer_wait_running(timr);
+ rcu_read_unlock();
+ /* @timr is on stack and is valid */
+ spin_lock_irq(&timr->it_lock);
}
/*
@@ -1177,6 +1221,7 @@ void clear_posix_cputimers_work(struct task_struct *p)
sizeof(p->posix_cputimers_work.work));
init_task_work(&p->posix_cputimers_work.work,
posix_cpu_timers_work);
+ mutex_init(&p->posix_cputimers_work.mutex);
p->posix_cputimers_work.scheduled = false;
}
@@ -1255,6 +1300,18 @@ static inline void __run_posix_cpu_timers(struct task_struct *tsk)
lockdep_posixtimer_exit();
}
+static void posix_cpu_timer_wait_running(struct k_itimer *timr)
+{
+ cpu_relax();
+}
+
+static void posix_cpu_timer_wait_running_nsleep(struct k_itimer *timr)
+{
+ spin_unlock_irq(&timr->it_lock);
+ cpu_relax();
+ spin_lock_irq(&timr->it_lock);
+}
+
static inline bool posix_cpu_timers_work_scheduled(struct task_struct *tsk)
{
return false;
@@ -1363,6 +1420,8 @@ static void handle_posix_cpu_timers(struct task_struct *tsk)
*/
if (likely(cpu_firing >= 0))
cpu_timer_fire(timer);
+ /* See posix_cpu_timer_wait_running() */
+ rcu_assign_pointer(timer->it.cpu.handling, NULL);
spin_unlock(&timer->it_lock);
}
}
@@ -1497,23 +1556,16 @@ static int do_cpu_nanosleep(const clockid_t which_clock, int flags,
expires = cpu_timer_getexpires(&timer.it.cpu);
error = posix_cpu_timer_set(&timer, 0, &zero_it, &it);
if (!error) {
- /*
- * Timer is now unarmed, deletion can not fail.
- */
+ /* Timer is now unarmed, deletion can not fail. */
posix_cpu_timer_del(&timer);
+ } else {
+ while (error == TIMER_RETRY) {
+ posix_cpu_timer_wait_running_nsleep(&timer);
+ error = posix_cpu_timer_del(&timer);
+ }
}
- spin_unlock_irq(&timer.it_lock);
- while (error == TIMER_RETRY) {
- /*
- * We need to handle case when timer was or is in the
- * middle of firing. In other cases we already freed
- * resources.
- */
- spin_lock_irq(&timer.it_lock);
- error = posix_cpu_timer_del(&timer);
- spin_unlock_irq(&timer.it_lock);
- }
+ spin_unlock_irq(&timer.it_lock);
if ((it.it_value.tv_sec | it.it_value.tv_nsec) == 0) {
/*
@@ -1623,6 +1675,7 @@ const struct k_clock clock_posix_cpu = {
.timer_del = posix_cpu_timer_del,
.timer_get = posix_cpu_timer_get,
.timer_rearm = posix_cpu_timer_rearm,
+ .timer_wait_running = posix_cpu_timer_wait_running,
};
const struct k_clock clock_process = {
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 0c8a87a..808a247 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -846,6 +846,10 @@ static struct k_itimer *timer_wait_running(struct k_itimer *timer,
rcu_read_lock();
unlock_timer(timer, *flags);
+ /*
+ * kc->timer_wait_running() might drop RCU lock. So @timer
+ * cannot be touched anymore after the function returns!
+ */
if (!WARN_ON_ONCE(!kc->timer_wait_running))
kc->timer_wait_running(timer);
Thomas,
Here's a small collection of irqchip patches for 6.4. The only
significant thing is the RISC-V IPI rework, which spans both the
irqchip subsystem and the arch code (and is Acked by Palmer).
The rest is a bunch of errata workarounds, fixes and cleanups.
Please pull,
M.
The following changes since commit 197b6b60ae7bc51dd0814953c562833143b292aa:
Linux 6.3-rc4 (2023-03-26 14:40:20 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git tags/irqchip-6.4
for you to fetch changes up to 2ff1b0839ddd514be4752c64c1c6facf91ff3a56:
Merge branch irq/misc-6.4 into irq/irqchip-next (2023-04-21 14:05:31 +0100)
----------------------------------------------------------------
irqchip changes for 6.4
- Large RISC-V IPI rework to make way for a new interrupt
architecture
- More Loongarch fixes from Lianmin Lv, fixing issues in the so
called "dual-bridge" systems.
- Workaround for the nvidia T241 chip that gets confused in
3 and 4 socket configurations, leading to the GIC
malfunctionning in some contexts
- Drop support for non-firmware driven GIC configurarations
now that the old ARM11MP Cavium board is gone
- Workaround for the Rockchip 3588 chip that doesn't
correctly deal with the shareability attributes.
- Replace uses of of_find_property() with the more appropriate
of_property_read_bool()
- Make bcm-6345-l1 request its MMIO region
- Add suspend support to the SiFive PLIC
- Drop support for stih415, stih416 and stid127 platforms
----------------------------------------------------------------
Alain Volmat (1):
irqchip/st: Remove stih415/stih416 and stid127 platforms support
Anup Patel (7):
RISC-V: Clear SIP bit only when using SBI IPI operations
irqchip/riscv-intc: Allow drivers to directly discover INTC hwnode
RISC-V: Treat IPIs as normal Linux IRQs
RISC-V: Allow marking IPIs as suitable for remote FENCEs
RISC-V: Use IPIs for remote TLB flush when possible
RISC-V: Use IPIs for remote icache flush when possible
irqchip/riscv-intc: Add empty irq_eoi() for chained irq handlers
Jianmin Lv (5):
irqchip/loongson-eiointc: Fix returned value on parsing MADT
irqchip/loongson-eiointc: Fix incorrect use of acpi_get_vec_parent
irqchip/loongson-eiointc: Fix registration of syscore_ops
irqchip/loongson-pch-pic: Fix registration of syscore_ops
irqchip/loongson-pch-pic: Fix pch_pic_acpi_init calling
Marc Zyngier (5):
irqchip/gic: Drop support for board files
Merge branch irq/gic-6.4 into irq/irqchip-next
Merge branch irq/riscv-ipi into irq/irqchip-next
Merge branch irq/loongarch-fixes-6.4 into irq/irqchip-next
Merge branch irq/misc-6.4 into irq/irqchip-next
Mason Huo (1):
irqchip/irq-sifive-plic: Add syscore callbacks for hibernation
Rob Herring (1):
irqchip: Use of_property_read_bool() for boolean properties
Sebastian Reichel (1):
irqchip/gic-v3: Add Rockchip 3588001 erratum workaround
Shanker Donthineni (1):
irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4
Álvaro Fernández Rojas (1):
irqchip/bcm-6345-l1: Request memory region
Documentation/arm64/silicon-errata.rst | 5 +
arch/arm64/Kconfig | 10 ++
arch/riscv/Kconfig | 2 +
arch/riscv/include/asm/irq.h | 4 +
arch/riscv/include/asm/sbi.h | 9 +-
arch/riscv/include/asm/smp.h | 49 +++++++---
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/cpu-hotplug.c | 3 +-
arch/riscv/kernel/irq.c | 21 +++-
arch/riscv/kernel/sbi-ipi.c | 77 +++++++++++++++
arch/riscv/kernel/sbi.c | 100 +++----------------
arch/riscv/kernel/smp.c | 171 +++++++++++++++++----------------
arch/riscv/kernel/smpboot.c | 5 +-
arch/riscv/mm/cacheflush.c | 5 +-
arch/riscv/mm/tlbflush.c | 93 +++++++++++++++---
drivers/clocksource/timer-clint.c | 65 ++++++++++---
drivers/firmware/smccc/smccc.c | 26 +++++
drivers/firmware/smccc/soc_id.c | 28 +-----
drivers/irqchip/Kconfig | 3 +
drivers/irqchip/irq-bcm6345-l1.c | 6 +-
drivers/irqchip/irq-csky-apb-intc.c | 2 +-
drivers/irqchip/irq-gic-v2m.c | 2 +-
drivers/irqchip/irq-gic-v3-its.c | 35 +++++++
drivers/irqchip/irq-gic-v3.c | 115 +++++++++++++++++++---
drivers/irqchip/irq-gic.c | 60 +-----------
drivers/irqchip/irq-loongson-eiointc.c | 32 ++++--
drivers/irqchip/irq-loongson-pch-pic.c | 6 +-
drivers/irqchip/irq-riscv-intc.c | 71 ++++++++------
drivers/irqchip/irq-sifive-plic.c | 93 +++++++++++++++++-
drivers/irqchip/irq-st.c | 15 ---
include/linux/arm-smccc.h | 18 ++++
include/linux/irqchip/arm-gic.h | 6 --
32 files changed, 761 insertions(+), 377 deletions(-)
create mode 100644 arch/riscv/kernel/sbi-ipi.c
From: Kornel Dulęba <korneld(a)chromium.org>
Leverage gpiochip_line_is_irq to check whether a pin has an irq
associated with it. The previous check ("irq == 0") didn't make much
sense. The irq variable refers to the pinctrl irq, and has nothing do to
with an individual pin.
On some systems, during suspend/resume cycle, the firmware leaves
an interrupt enabled on a pin that is not used by the kernel.
Without this patch that caused an interrupt storm.
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217315
Signed-off-by: Kornel Dulęba <korneld(a)chromium.org>
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/pinctrl/pinctrl-amd.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/pinctrl/pinctrl-amd.c b/drivers/pinctrl/pinctrl-amd.c
index 24465010397b..675c9826b78a 100644
--- a/drivers/pinctrl/pinctrl-amd.c
+++ b/drivers/pinctrl/pinctrl-amd.c
@@ -660,21 +660,21 @@ static bool do_amd_gpio_irq_handler(int irq, void *dev_id)
* We must read the pin register again, in case the
* value was changed while executing
* generic_handle_domain_irq() above.
- * If we didn't find a mapping for the interrupt,
- * disable it in order to avoid a system hang caused
- * by an interrupt storm.
+ * If the line is not an irq, disable it in order to
+ * avoid a system hang caused by an interrupt storm.
*/
raw_spin_lock_irqsave(&gpio_dev->lock, flags);
regval = readl(regs + i);
- if (irq == 0) {
- regval &= ~BIT(INTERRUPT_ENABLE_OFF);
+ if (!gpiochip_line_is_irq(gc, irqnr + i)) {
+ regval &= ~BIT(INTERRUPT_MASK_OFF);
dev_dbg(&gpio_dev->pdev->dev,
"Disabling spurious GPIO IRQ %d\n",
irqnr + i);
+ } else {
+ ret = true;
}
writel(regval, regs + i);
raw_spin_unlock_irqrestore(&gpio_dev->lock, flags);
- ret = true;
}
}
/* did not cause wake on resume context for shared IRQ */
--
2.34.1
commit 4e5a04be88fe ("pinctrl: amd: disable and mask interrupts on probe")
had a mistake in loop iteration 63 that it would clear offset 0xFC instead
of 0x100. Offset 0xFC is actually `WAKE_INT_MASTER_REG`. This was
clearing bits 13 and 15 from the register which significantly changed the
expected handling for some platforms for GPIO0.
commit b26cd9325be4 ("pinctrl: amd: Disable and mask interrupts on resume")
actually fixed this bug, but lead to regressions on Lenovo Z13 and some
other systems. This is because there was no handling in the driver for bit
15 debounce behavior.
Quoting a public BKDG:
```
EnWinBlueBtn. Read-write. Reset: 0. 0=GPIO0 detect debounced power button;
Power button override is 4 seconds. 1=GPIO0 detect debounced power button
in S3/S5/S0i3, and detect "pressed less than 2 seconds" and "pressed 2~10
seconds" in S0; Power button override is 10 seconds
```
Cross referencing the same master register in Windows it's obvious that
Windows doesn't use debounce values in this configuration. So align the
Linux driver to do this as well. This fixes wake on lid when
WAKE_INT_MASTER_REG is properly programmed.
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217315
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/pinctrl/pinctrl-amd.c | 7 +++++++
drivers/pinctrl/pinctrl-amd.h | 1 +
2 files changed, 8 insertions(+)
diff --git a/drivers/pinctrl/pinctrl-amd.c b/drivers/pinctrl/pinctrl-amd.c
index c250110f6775..6b9ae92017d4 100644
--- a/drivers/pinctrl/pinctrl-amd.c
+++ b/drivers/pinctrl/pinctrl-amd.c
@@ -125,6 +125,12 @@ static int amd_gpio_set_debounce(struct gpio_chip *gc, unsigned offset,
struct amd_gpio *gpio_dev = gpiochip_get_data(gc);
raw_spin_lock_irqsave(&gpio_dev->lock, flags);
+
+ /* Use special handling for Pin0 debounce */
+ pin_reg = readl(gpio_dev->base + WAKE_INT_MASTER_REG);
+ if (pin_reg & INTERNAL_GPIO0_DEBOUNCE)
+ debounce = 0;
+
pin_reg = readl(gpio_dev->base + offset * 4);
if (debounce) {
@@ -219,6 +225,7 @@ static void amd_gpio_dbg_show(struct seq_file *s, struct gpio_chip *gc)
char *debounce_enable;
char *wake_cntrlz;
+ seq_printf(s, "WAKE_INT_MASTER_REG: 0x%08x\n", readl(gpio_dev->base + WAKE_INT_MASTER_REG));
for (bank = 0; bank < gpio_dev->hwbank_num; bank++) {
unsigned int time = 0;
unsigned int unit = 0;
diff --git a/drivers/pinctrl/pinctrl-amd.h b/drivers/pinctrl/pinctrl-amd.h
index 81ae8319a1f0..1cf2d06bbd8c 100644
--- a/drivers/pinctrl/pinctrl-amd.h
+++ b/drivers/pinctrl/pinctrl-amd.h
@@ -17,6 +17,7 @@
#define AMD_GPIO_PINS_BANK3 32
#define WAKE_INT_MASTER_REG 0xfc
+#define INTERNAL_GPIO0_DEBOUNCE (1 << 15)
#define EOI_MASK (1 << 29)
#define WAKE_INT_STATUS_REG0 0x2f8
--
2.34.1
Currently, on a handful of ASICs. We allow the framebuffer for a given
plane to exist in either VRAM or GTT. However, if the plane's new
framebuffer is in a different memory domain than it's previous
framebuffer, flipping between them can cause the screen to flicker. So,
to fix this, don't perform an immediate flip in the aforementioned case.
Cc: stable(a)vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2354
Reviewed-by: Roman Li <Roman.Li(a)amd.com>
Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)")
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
---
v2: make a number of clarifications to the commit message and drop
locking.
v3: use a stronger check
v4: drop mem_type
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index dfcb9815b5a8..76a776fd8437 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -7900,6 +7900,13 @@ static void amdgpu_dm_commit_cursors(struct drm_atomic_state *state)
amdgpu_dm_plane_handle_cursor_update(plane, old_plane_state);
}
+static inline uint32_t get_mem_type(struct drm_framebuffer *fb)
+{
+ struct amdgpu_bo *abo = gem_to_amdgpu_bo(fb->obj[0]);
+
+ return abo->tbo.resource ? abo->tbo.resource->mem_type : 0;
+}
+
static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
struct dc_state *dc_state,
struct drm_device *dev,
@@ -8042,11 +8049,13 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
/*
* Only allow immediate flips for fast updates that don't
- * change FB pitch, DCC state, rotation or mirroing.
+ * change memory domain, FB pitch, DCC state, rotation or
+ * mirroring.
*/
bundle->flip_addrs[planes_count].flip_immediate =
crtc->state->async_flip &&
- acrtc_state->update_type == UPDATE_TYPE_FAST;
+ acrtc_state->update_type == UPDATE_TYPE_FAST &&
+ get_mem_type(old_plane_state->fb) == get_mem_type(fb);
timestamp_ns = ktime_get_ns();
bundle->flip_addrs[planes_count].flip_timestamp_in_us = div_u64(timestamp_ns, 1000);
--
2.40.0
Until now, the page table walker counted increments to the PA and IPA
of a walk in two separate places. While the PA is incremented as soon as
a leaf PTE is installed in stage2_map_walker_try_leaf(), the IPA is
actually bumped in the generic table walker context. Critically,
__kvm_pgtable_visit() rereads the PTE after the LEAF callback returns
to work out if a table or leaf was installed, and only bumps the IPA for
a leaf PTE.
This arrangement worked fine when we handled faults behind the write lock,
as the walker had exclusive access to the stage-2 page tables. However,
commit 1577cb5823ce ("KVM: arm64: Handle stage-2 faults in parallel")
started handling all stage-2 faults behind the read lock, opening up a
race where a walker could increment the PA but not the IPA of a walk.
Nothing good ensues, as the walker starts mapping with the incorrect
IPA -> PA relationship.
For example, assume that two vCPUs took a data abort on the same IPA.
One observes that dirty logging is disabled, and the other observed that
it is enabled:
vCPU attempting PMD mapping vCPU attempting PTE mapping
====================================== =====================================
/* install PMD */
stage2_make_pte(ctx, leaf);
data->phys += granule;
/* replace PMD with a table */
stage2_try_break_pte(ctx, data->mmu);
stage2_make_pte(ctx, table);
/* table is observed */
ctx.old = READ_ONCE(*ptep);
table = kvm_pte_table(ctx.old, level);
/*
* map walk continues w/o incrementing
* IPA.
*/
__kvm_pgtable_walk(..., level + 1);
Bring an end to the whole mess by using the IPA as the single source of
truth for how far along a walk has gotten. Work out the correct PA to
map by calculating the IPA offset from the beginning of the walk and add
that to the starting physical address.
Cc: stable(a)vger.kernel.org
Fixes: 1577cb5823ce ("KVM: arm64: Handle stage-2 faults in parallel")
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
---
arch/arm64/include/asm/kvm_pgtable.h | 1 +
arch/arm64/kvm/hyp/pgtable.c | 32 ++++++++++++++++++++++++----
2 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 4cd6762bda80..dc3c072e862f 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -209,6 +209,7 @@ struct kvm_pgtable_visit_ctx {
kvm_pte_t old;
void *arg;
struct kvm_pgtable_mm_ops *mm_ops;
+ u64 start;
u64 addr;
u64 end;
u32 level;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 3d61bd3e591d..140f82300db5 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -58,6 +58,7 @@
struct kvm_pgtable_walk_data {
struct kvm_pgtable_walker *walker;
+ u64 start;
u64 addr;
u64 end;
};
@@ -201,6 +202,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
.old = READ_ONCE(*ptep),
.arg = data->walker->arg,
.mm_ops = mm_ops,
+ .start = data->start,
.addr = data->addr,
.end = data->end,
.level = level,
@@ -293,6 +295,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
struct kvm_pgtable_walker *walker)
{
struct kvm_pgtable_walk_data walk_data = {
+ .start = ALIGN_DOWN(addr, PAGE_SIZE),
.addr = ALIGN_DOWN(addr, PAGE_SIZE),
.end = PAGE_ALIGN(walk_data.addr + size),
.walker = walker,
@@ -794,20 +797,43 @@ static bool stage2_pte_executable(kvm_pte_t pte)
return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
}
+static u64 stage2_map_walker_phys_addr(const struct kvm_pgtable_visit_ctx *ctx,
+ const struct stage2_map_data *data)
+{
+ u64 phys = data->phys;
+
+ /*
+ * Stage-2 walks to update ownership data are communicated to the map
+ * walker using an invalid PA. Avoid offsetting an already invalid PA,
+ * which could overflow and make the address valid again.
+ */
+ if (!kvm_phys_is_valid(phys))
+ return phys;
+
+ /*
+ * Otherwise, work out the correct PA based on how far the walk has
+ * gotten.
+ */
+ return phys + (ctx->addr - ctx->start);
+}
+
static bool stage2_leaf_mapping_allowed(const struct kvm_pgtable_visit_ctx *ctx,
struct stage2_map_data *data)
{
+ u64 phys = stage2_map_walker_phys_addr(ctx, data);
+
if (data->force_pte && (ctx->level < (KVM_PGTABLE_MAX_LEVELS - 1)))
return false;
- return kvm_block_mapping_supported(ctx, data->phys);
+ return kvm_block_mapping_supported(ctx, phys);
}
static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
struct stage2_map_data *data)
{
kvm_pte_t new;
- u64 granule = kvm_granule_size(ctx->level), phys = data->phys;
+ u64 phys = stage2_map_walker_phys_addr(ctx, data);
+ u64 granule = kvm_granule_size(ctx->level);
struct kvm_pgtable *pgt = data->mmu->pgt;
struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
@@ -841,8 +867,6 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
stage2_make_pte(ctx, new);
- if (kvm_phys_is_valid(phys))
- data->phys += granule;
return 0;
}
--
2.40.0.634.g4ca3ef3211-goog
--
Hello Dear,
how are you today? I hope you are fine
My name is Dr. Ava Smith, I Am an English and French nationality.
I will give you pictures and more details about me as soon as I hear from you
Thanks
Ava
Do not call gadget stop until the poll for controller halt is
completed. DEVTEN is cleared as part of gadget stop, so the intention to
allow ep0 events to continue while waiting for controller halt is not
happening.
Fixes: c96683798e27 ("usb: dwc3: ep0: Don't prepare beyond Setup stage")
Cc: stable(a)vger.kernel.org
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Wesley Cheng <quic_wcheng(a)quicinc.com>
---
drivers/usb/dwc3/gadget.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 9f492c8a7d0b..dd6057bad37e 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2637,7 +2637,6 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
* bit.
*/
dwc3_stop_active_transfers(dwc);
- __dwc3_gadget_stop(dwc);
spin_unlock_irqrestore(&dwc->lock, flags);
/*
@@ -2674,7 +2673,19 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
* remaining event generated by the controller while polling for
* DSTS.DEVCTLHLT.
*/
- return dwc3_gadget_run_stop(dwc, false);
+ ret = dwc3_gadget_run_stop(dwc, false);
+
+ /*
+ * Stop the gadget after controller is halted, so that if needed, the
+ * events to update EP0 state can still occur while the run/stop
+ * routine polls for the halted state. DEVTEN is cleared as part of
+ * gadget stop.
+ */
+ spin_lock_irqsave(&dwc->lock, flags);
+ __dwc3_gadget_stop(dwc);
+ spin_unlock_irqrestore(&dwc->lock, flags);
+
+ return ret;
}
static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
Hi,
I would like to see these patches backported. They are needed so perf
can be cross compiled with gcc on v5.15, v5.10 and v5.4.
I built it with tuxmake [1] here are two example commandlines:
tuxmake --runtime podman --target-arch arm64 --toolchain gcc-12 --kconfig defconfig perf
tuxmake --runtime podman --target-arch x86_64 --toolchain gcc-12 --kconfig defconfig perf
Tried to build perf with both gcc-11 and gcc-12.
Patch 'tools perf: Fix compilation error with new binutils'
and 'tools build: Add feature test for init_disassemble_info API changes'
didn't apply cleanly, thats why I send these in a patchset.
When apply 'tools build: Add feature test for
init_disassemble_info API changes' to 5.4 it will be a minor merge
conflict, do you want me to send this patch in two separate patches one
for 5.4 and another for v5.10?
The sha for these two patches in mainline are.
cfd59ca91467 tools build: Add feature test for init_disassemble_info API changes
83aa0120487e tools perf: Fix compilation error with new binutils
The above patches solves these:
util/annotate.c: In function 'symbol__disassemble_bpf':
util/annotate.c:1729:9: error: too few arguments to function 'init_disassemble_info'
1729 | init_disassemble_info(&info, s,
| ^~~~~~~~~~~~~~~~~~~~~
Please apply these to v5.10 and v5.4
a45b3d692623 tools include: add dis-asm-compat.h to handle version differences
d08c84e01afa perf sched: Cast PTHREAD_STACK_MIN to int as it may turn into sysconf(__SC_THREAD_STACK>
The above patches solves these:
/home/anders/src/kernel/stable-5.10/tools/include/linux/kernel.h:43:24: error: comparison of distinct pointer types lacks a cast [-Werror]
43 | (void) (&_max1 == &_max2); \
| ^~
builtin-sched.c:673:34: note: in expansion of macro 'max'
673 | (size_t) max(16 * 1024, PTHREAD_STACK_MIN));
| ^~~
Please apply these to v5.15, v5.10 and v5.4
8e8bf60a6754 perf build: Fixup disabling of -Wdeprecated-declarations for the python scripting engine
4ee3c4da8b1b perf scripting python: Do not build fail on deprecation warnings
63a4354ae75c perf scripting perl: Ignore some warnings to keep building with perl headers
Build error that the above 3 patches solves are:
/usr/lib/x86_64-linux-gnu/perl/5.36/CORE/handy.h:125:23: error: cast from function call of type 'STRLEN' {aka 'long unsigned int'} to non-matching type '_Bool' [-Werror=bad-function-cast]
125 | #define cBOOL(cbool) ((bool) (cbool))
| ^
Cheers,
Anders
[1] https://tuxmake.org/
Andres Freund (2):
tools perf: Fix compilation error with new binutils
tools build: Add feature test for init_disassemble_info API changes
tools/build/Makefile.feature | 1 +
tools/build/feature/Makefile | 4 ++++
tools/build/feature/test-all.c | 4 ++++
tools/build/feature/test-disassembler-init-styled.c | 13 +++++++++++++
tools/perf/Makefile.config | 8 ++++++++
tools/perf/util/annotate.c | 7 ++++---
6 files changed, 34 insertions(+), 3 deletions(-)
create mode 100644 tools/build/feature/test-disassembler-init-styled.c
--
2.39.2
I'm announcing the release of the 4.19.281 kernel.
All users of the 4.19 kernel series must upgrade.
The updated 4.19.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.19.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Documentation/sound/hd-audio/models.rst | 2
Makefile | 2
arch/arm64/kvm/guest.c | 83 +++++++++++++++++++-----
arch/x86/kernel/sysfb_efi.c | 8 ++
arch/x86/kvm/vmx/vmx.c | 10 ++
arch/x86/pci/fixup.c | 21 ++++++
crypto/asymmetric_keys/verify_pefile.c | 12 ++-
drivers/gpio/gpio-davinci.c | 2
drivers/hwtracing/coresight/coresight-etm4x.c | 2
drivers/i2c/busses/i2c-imx-lpi2c.c | 2
drivers/iio/dac/cio-dac.c | 4 -
drivers/mtd/mtdblock.c | 12 ++-
drivers/mtd/ubi/build.c | 21 ++++--
drivers/mtd/ubi/wl.c | 5 -
drivers/net/ethernet/cadence/macb_main.c | 4 +
drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c | 8 ++
drivers/net/ethernet/sun/niu.c | 2
drivers/pinctrl/pinctrl-amd.c | 56 ++++++++++++----
drivers/power/supply/cros_usbpd-charger.c | 2
drivers/pwm/pwm-cros-ec.c | 1
drivers/scsi/ses.c | 20 ++---
drivers/tty/serial/sh-sci.c | 9 ++
drivers/usb/serial/cp210x.c | 1
drivers/usb/serial/option.c | 10 ++
drivers/watchdog/sbsa_gwdt.c | 1
fs/nfs/nfs4_fs.h | 2
fs/nfs/nfs4proc.c | 25 ++++---
fs/nfs/nfs4state.c | 8 +-
fs/nilfs2/segment.c | 3
fs/nilfs2/super.c | 2
fs/nilfs2/the_nilfs.c | 12 ++-
include/linux/ftrace.h | 2
kernel/cgroup/cpuset.c | 4 -
kernel/events/core.c | 2
kernel/trace/ring_buffer.c | 13 +++
mm/swapfile.c | 3
net/9p/trans_xen.c | 4 +
net/bluetooth/hidp/core.c | 2
net/bluetooth/l2cap_core.c | 24 +-----
net/core/netpoll.c | 19 +++++
net/ipv4/icmp.c | 5 +
net/ipv6/ip6_output.c | 7 +-
net/ipv6/udp.c | 8 +-
net/mac80211/sta_info.c | 3
net/sctp/socket.c | 4 +
net/sctp/stream_interleave.c | 3
sound/i2c/cs8427.c | 7 +-
sound/pci/emu10k1/emupcm.c | 4 -
sound/pci/hda/patch_realtek.c | 1
sound/pci/hda/patch_sigmatel.c | 10 ++
50 files changed, 349 insertions(+), 128 deletions(-)
Alexander Stein (1):
i2c: imx-lpi2c: clean rx/tx buffers upon new message
Bang Li (1):
mtdblock: tolerate corrected bit-flips
Basavaraj Natikar (1):
x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
Biju Das (2):
tty: serial: sh-sci: Fix transmit end interrupt handler
tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
Bjørn Mork (1):
USB: serial: option: add Quectel RM500U-CN modem
Dave Martin (2):
KVM: arm64: Factor out core register ID enumeration
KVM: arm64: Filter out invalid core register IDs in KVM_GET_REG_LIST
Denis Plotnikov (1):
qlcnic: check pci_reset_function result
Dhruva Gole (1):
gpio: davinci: Add irq chip flag to skip set wake
Enrico Sau (1):
USB: serial: option: add Telit FE990 compositions
Eric Dumazet (2):
icmp: guard against too small mtu
udp6: fix potential access to stale information
Felix Fietkau (1):
wifi: mac80211: fix invalid drv_sta_pre_rcu_remove calls for non-uploaded sta
George Cherian (1):
watchdog: sbsa_wdog: Make sure the timeout programming is within the limits
Grant Grundler (1):
power: supply: cros_usbpd: reclassify "default case!" as debug
Greg Kroah-Hartman (1):
Linux 4.19.281
Hans de Goede (1):
efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L
Harshit Mogalapalli (1):
niu: Fix missing unwind goto in niu_alloc_channels()
Jakub Kicinski (1):
net: don't let netpoll invoke NAPI if in xmit context
Jeremy Soller (1):
ALSA: hda/realtek: Add quirk for Clevo X370SNW
Jiri Kosina (1):
scsi: ses: Handle enclosure with just a primary component gracefully
John Keeping (1):
ftrace: Mark get_lock_parent_ip() __always_inline
Kan Liang (1):
perf/core: Fix the same task check in perf_event_set_output
Kees Jan Koster (1):
USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
Kornel Dulęba (2):
pinctrl: amd: Disable and mask interrupts on resume
Revert "pinctrl: amd: Disable and mask interrupts on resume"
Lee Jones (1):
mtd: ubi: wl: Fix a couple of kernel-doc issues
Linus Walleij (1):
pinctrl: amd: Use irqchip template
Luiz Augusto von Dentz (1):
Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
Marc Zyngier (1):
arm64: KVM: Fix system register enumeration
Min Li (1):
Bluetooth: Fix race condition in hidp_session_thread
Oswald Buddenhagen (4):
ALSA: emu10k1: fix capture interrupt handler unlinking
ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
ALSA: i2c/cs8427: fix iec958 mixer control deactivation
ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
Paolo Bonzini (1):
KVM: nVMX: add missing consistency checks for CR0 and CR4
Robbie Harwood (1):
verify_pefile: relax wrapper length check
Roman Gushchin (1):
net: macb: fix a memory corruption in extended buffer descriptor mode
Rongwei Wang (1):
mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
Ryusuke Konishi (2):
nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
nilfs2: fix sysfs interface lifetime
Sachi King (1):
pinctrl: amd: disable and mask interrupts on probe
Sandeep Singh (1):
pinctrl: Added IRQF_SHARED flag for amd-pinctrl driver
Steve Clevenger (1):
coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
Trond Myklebust (3):
NFSv4: Convert struct nfs4_state to use refcount_t
NFSv4: Check the return value of update_open_stateid()
NFSv4: Fix hangs when recovering open state after a server reboot
Uwe Kleine-König (1):
pwm: cros-ec: Explicitly set .polarity in .get_state()
Waiman Long (1):
cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
William Breathitt Gray (1):
iio: dac: cio-dac: Fix max DAC write value check for 12-bit
Xin Long (2):
sctp: check send stream number after wait_for_sndbuf
sctp: fix a potential overflow in sctp_ifwdtsn_skip
ZhaoLong Wang (1):
ubi: Fix deadlock caused by recursively holding work_sem
Zheng Wang (1):
9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition
Zheng Yejian (1):
ring-buffer: Fix race while reader and writer are on the same page
Zhihao Cheng (1):
ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
Ziyang Xuan (1):
ipv6: Fix an uninit variable access bug in __ip6_make_skb()
The following commit has been merged into the timers/core branch of tip:
Commit-ID: 2aaae4bf41b101f7e58e8b06778b1cd9a1dddf94
Gitweb: https://git.kernel.org/tip/2aaae4bf41b101f7e58e8b06778b1cd9a1dddf94
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Mon, 17 Apr 2023 15:37:55 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Thu, 20 Apr 2023 09:47:26 +02:00
posix-cpu-timers: Implement the missing timer_wait_running callback
For some unknown reason the introduction of the timer_wait_running callback
missed to fixup posix CPU timers, which went unnoticed for almost four years.
Marco reported recently that the WARN_ON() in timer_wait_running()
triggers with a posix CPU timer test case.
Posix CPU timers have two execution models for expiring timers depending on
CONFIG_POSIX_CPU_TIMERS_TASK_WORK:
1) If not enabled, the expiry happens in hard interrupt context so
spin waiting on the remote CPU is reasonably time bound.
Implement an empty stub function for that case.
2) If enabled, the expiry happens in task work before returning to user
space or guest mode. The expired timers are marked as firing and moved
from the timer queue to a local list head with sighand lock held. Once
the timers are moved, sighand lock is dropped and the expiry happens in
fully preemptible context. That means the expiring task can be scheduled
out, migrated, interrupted etc. So spin waiting on it is more than
suboptimal.
The timer wheel has a timer_wait_running() mechanism for RT, which uses
a per CPU timer-base expiry lock which is held by the expiry code and the
task waiting for the timer function to complete blocks on that lock.
This does not work in the same way for posix CPU timers as there is no
timer base and expiry for process wide timers can run on any task
belonging to that process, but the concept of waiting on an expiry lock
can be used too in a slightly different way:
- Add a mutex to struct posix_cputimers_work. This struct is per task
and used to schedule the expiry task work from the timer interrupt.
- Add a task_struct pointer to struct cpu_timer which is used to store
a the task which runs the expiry. That's filled in when the task
moves the expired timers to the local expiry list. That's not
affecting the size of the k_itimer union as there are bigger union
members already
- Let the task take the expiry mutex around the expiry function
- Let the waiter acquire a task reference with rcu_read_lock() held and
block on the expiry mutex
This avoids spin-waiting on a task which might not even be on a CPU and
works nicely for RT too.
Fixes: ec8f954a40da ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT")
Reported-by: Marco Elver <elver(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Marco Elver <elver(a)google.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87zg764ojw.ffs@tglx
---
include/linux/posix-timers.h | 17 ++++---
kernel/time/posix-cpu-timers.c | 81 +++++++++++++++++++++++++++------
kernel/time/posix-timers.c | 4 ++-
3 files changed, 82 insertions(+), 20 deletions(-)
diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 2c6e99c..d607f51 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -4,6 +4,7 @@
#include <linux/spinlock.h>
#include <linux/list.h>
+#include <linux/mutex.h>
#include <linux/alarmtimer.h>
#include <linux/timerqueue.h>
@@ -62,16 +63,18 @@ static inline int clockid_to_fd(const clockid_t clk)
* cpu_timer - Posix CPU timer representation for k_itimer
* @node: timerqueue node to queue in the task/sig
* @head: timerqueue head on which this timer is queued
- * @task: Pointer to target task
+ * @pid: Pointer to target task PID
* @elist: List head for the expiry list
* @firing: Timer is currently firing
+ * @handling: Pointer to the task which handles expiry
*/
struct cpu_timer {
- struct timerqueue_node node;
- struct timerqueue_head *head;
- struct pid *pid;
- struct list_head elist;
- int firing;
+ struct timerqueue_node node;
+ struct timerqueue_head *head;
+ struct pid *pid;
+ struct list_head elist;
+ int firing;
+ struct task_struct __rcu *handling;
};
static inline bool cpu_timer_enqueue(struct timerqueue_head *head,
@@ -135,10 +138,12 @@ struct posix_cputimers {
/**
* posix_cputimers_work - Container for task work based posix CPU timer expiry
* @work: The task work to be scheduled
+ * @mutex: Mutex held around expiry in context of this task work
* @scheduled: @work has been scheduled already, no further processing
*/
struct posix_cputimers_work {
struct callback_head work;
+ struct mutex mutex;
unsigned int scheduled;
};
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 2f5e9b3..93c5a19 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -846,6 +846,8 @@ static u64 collect_timerqueue(struct timerqueue_head *head,
return expires;
ctmr->firing = 1;
+ /* See posix_cpu_timer_wait_running() */
+ rcu_assign_pointer(ctmr->handling, current);
cpu_timer_dequeue(ctmr);
list_add_tail(&ctmr->elist, firing);
}
@@ -1161,7 +1163,49 @@ static void handle_posix_cpu_timers(struct task_struct *tsk);
#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
static void posix_cpu_timers_work(struct callback_head *work)
{
+ struct posix_cputimers_work *cw = container_of(work, typeof(*cw), work);
+
+ mutex_lock(&cw->mutex);
handle_posix_cpu_timers(current);
+ mutex_unlock(&cw->mutex);
+}
+
+/*
+ * Invoked from the posix-timer core when a cancel operation failed because
+ * the timer is marked firing. The caller holds rcu_read_lock(), which
+ * protects the timer and the task which is expiring it from being freed.
+ */
+static void posix_cpu_timer_wait_running(struct k_itimer *timr)
+{
+ struct task_struct *tsk = rcu_dereference(timr->it.cpu.handling);
+
+ /* Has the handling task completed expiry already? */
+ if (!tsk)
+ return;
+
+ /* Ensure that the task cannot go away */
+ get_task_struct(tsk);
+ /* Now drop the RCU protection so the mutex can be locked */
+ rcu_read_unlock();
+ /* Wait on the expiry mutex */
+ mutex_lock(&tsk->posix_cputimers_work.mutex);
+ /* Release it immediately again. */
+ mutex_unlock(&tsk->posix_cputimers_work.mutex);
+ /* Drop the task reference. */
+ put_task_struct(tsk);
+ /* Relock RCU so the callsite is balanced */
+ rcu_read_lock();
+}
+
+static void posix_cpu_timer_wait_running_nsleep(struct k_itimer *timr)
+{
+ /* Ensure that timr->it.cpu.handling task cannot go away */
+ rcu_read_lock();
+ spin_unlock_irq(&timr->it_lock);
+ posix_cpu_timer_wait_running(timr);
+ rcu_read_unlock();
+ /* @timr is on stack and is valid */
+ spin_lock_irq(&timr->it_lock);
}
/*
@@ -1177,6 +1221,7 @@ void clear_posix_cputimers_work(struct task_struct *p)
sizeof(p->posix_cputimers_work.work));
init_task_work(&p->posix_cputimers_work.work,
posix_cpu_timers_work);
+ mutex_init(&p->posix_cputimers_work.mutex);
p->posix_cputimers_work.scheduled = false;
}
@@ -1255,6 +1300,18 @@ static inline void __run_posix_cpu_timers(struct task_struct *tsk)
lockdep_posixtimer_exit();
}
+static void posix_cpu_timer_wait_running(struct k_itimer *timr)
+{
+ cpu_relax();
+}
+
+static void posix_cpu_timer_wait_running_nsleep(struct k_itimer *timr)
+{
+ spin_unlock_irq(&timr.it_lock);
+ cpu_relax();
+ spin_lock_irq(&timr.it_lock);
+}
+
static inline bool posix_cpu_timers_work_scheduled(struct task_struct *tsk)
{
return false;
@@ -1363,6 +1420,8 @@ static void handle_posix_cpu_timers(struct task_struct *tsk)
*/
if (likely(cpu_firing >= 0))
cpu_timer_fire(timer);
+ /* See posix_cpu_timer_wait_running() */
+ rcu_assign_pointer(timer->it.cpu.handling, NULL);
spin_unlock(&timer->it_lock);
}
}
@@ -1497,23 +1556,16 @@ static int do_cpu_nanosleep(const clockid_t which_clock, int flags,
expires = cpu_timer_getexpires(&timer.it.cpu);
error = posix_cpu_timer_set(&timer, 0, &zero_it, &it);
if (!error) {
- /*
- * Timer is now unarmed, deletion can not fail.
- */
+ /* Timer is now unarmed, deletion can not fail. */
posix_cpu_timer_del(&timer);
+ } else {
+ while (error == TIMER_RETRY) {
+ posix_cpu_timer_wait_running_nsleep(&timer);
+ error = posix_cpu_timer_del(&timer);
+ }
}
- spin_unlock_irq(&timer.it_lock);
- while (error == TIMER_RETRY) {
- /*
- * We need to handle case when timer was or is in the
- * middle of firing. In other cases we already freed
- * resources.
- */
- spin_lock_irq(&timer.it_lock);
- error = posix_cpu_timer_del(&timer);
- spin_unlock_irq(&timer.it_lock);
- }
+ spin_unlock_irq(&timer.it_lock);
if ((it.it_value.tv_sec | it.it_value.tv_nsec) == 0) {
/*
@@ -1623,6 +1675,7 @@ const struct k_clock clock_posix_cpu = {
.timer_del = posix_cpu_timer_del,
.timer_get = posix_cpu_timer_get,
.timer_rearm = posix_cpu_timer_rearm,
+ .timer_wait_running = posix_cpu_timer_wait_running,
};
const struct k_clock clock_process = {
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 0c8a87a..808a247 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -846,6 +846,10 @@ static struct k_itimer *timer_wait_running(struct k_itimer *timer,
rcu_read_lock();
unlock_timer(timer, *flags);
+ /*
+ * kc->timer_wait_running() might drop RCU lock. So @timer
+ * cannot be touched anymore after the function returns!
+ */
if (!WARN_ON_ONCE(!kc->timer_wait_running))
kc->timer_wait_running(timer);
From: Xiaoyu Li <xiaoyu.li(a)corigine.com>
Before the referenced commit, if fewer interrupts are supported by
hardware than requested, then pci_msix_vec_count() returned the
former. However, after the referenced commit, an error is returned
for this condition. This causes a regression in the NFP driver
preventing probe from completing.
This situation may occur because the firmware allows sharing of
more than one queue per interrupt vector. And, thus, it is valid for
the firmware to advertise the number of queues it does. However,
interrupt sharing is not currently implemented by the NFP driver as
it seems likely - though not tested - that any gains obtained by
having more queues would be mitigated by sharing of interrupts.
Address this problem by limiting the number of vectors requested to
the number supported by hardware.
Also, make correct the max/min_irq types. They were unsigned
previously but should be signed.
Fixes: bab65e48cb06 ("PCI/MSI: Sanitize MSI-X checks")
CC: stable(a)vger.kernel.org
Signed-off-by: Xiaoyu Li <xiaoyu.li(a)corigine.com>
Acked-by: Simon Horman <simon.horman(a)corigine.com>
Signed-off-by: Louis Peens <louis.peens(a)corigine.com>
---
Changes: V1-->V2
* Updated the max/min_irq types to be signed instead of unsigned
* Fixed formatting of commit message to be better aligned at 72 chars
* Also updated the commit message to better explain why this is even
possible to happen, in response to the question from V1.
drivers/net/ethernet/netronome/nfp/nfp_net.h | 4 ++--
drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 12 +++++++++---
drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 9 +++++----
drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c | 8 ++++----
4 files changed, 20 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 939cfce15830..960f69325287 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -971,9 +971,9 @@ int nfp_net_mbox_reconfig_and_unlock(struct nfp_net *nn, u32 mbox_cmd);
void nfp_net_mbox_reconfig_post(struct nfp_net *nn, u32 update);
int nfp_net_mbox_reconfig_wait_posted(struct nfp_net *nn);
-unsigned int
+int
nfp_net_irqs_alloc(struct pci_dev *pdev, struct msix_entry *irq_entries,
- unsigned int min_irqs, unsigned int want_irqs);
+ int min_irqs, int want_irqs);
void nfp_net_irqs_disable(struct pci_dev *pdev);
void
nfp_net_irqs_assign(struct nfp_net *nn, struct msix_entry *irq_entries,
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 62f0bf91d1e1..ae309ea48356 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -362,14 +362,20 @@ int nfp_net_mbox_reconfig_and_unlock(struct nfp_net *nn, u32 mbox_cmd)
* @min_irqs: Minimal acceptable number of interrupts
* @wanted_irqs: Target number of interrupts to allocate
*
- * Return: Number of irqs obtained or 0 on error.
+ * Return: Number of irqs obtained or an errno.
*/
-unsigned int
+int
nfp_net_irqs_alloc(struct pci_dev *pdev, struct msix_entry *irq_entries,
- unsigned int min_irqs, unsigned int wanted_irqs)
+ int min_irqs, int wanted_irqs)
{
unsigned int i;
int got_irqs;
+ int max_irqs;
+
+ max_irqs = pci_msix_vec_count(pdev);
+ if (max_irqs < 0)
+ return max_irqs;
+ wanted_irqs = min_t(int, max_irqs, wanted_irqs);
for (i = 0; i < wanted_irqs; i++)
irq_entries[i].entry = i;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index cbe4972ba104..c1ac380542b5 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -222,7 +222,8 @@ static void nfp_net_pf_clean_vnic(struct nfp_pf *pf, struct nfp_net *nn)
static int nfp_net_pf_alloc_irqs(struct nfp_pf *pf)
{
- unsigned int wanted_irqs, num_irqs, vnics_left, irqs_left;
+ unsigned int vnics_left, irqs_left;
+ int wanted_irqs, num_irqs;
struct nfp_net *nn;
/* Get MSI-X vectors */
@@ -237,10 +238,10 @@ static int nfp_net_pf_alloc_irqs(struct nfp_pf *pf)
num_irqs = nfp_net_irqs_alloc(pf->pdev, pf->irq_entries,
NFP_NET_MIN_VNIC_IRQS * pf->num_vnics,
wanted_irqs);
- if (!num_irqs) {
- nfp_warn(pf->cpp, "Unable to allocate MSI-X vectors\n");
+ if (num_irqs < 0) {
+ nfp_warn(pf->cpp, "Unable to allocate MSI-X vectors (err=%d)\n", num_irqs);
kfree(pf->irq_entries);
- return -ENOMEM;
+ return num_irqs;
}
/* Distribute IRQs to vNICs */
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c b/drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c
index e19bb0150cb5..5f89c7198606 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_netvf_main.c
@@ -84,7 +84,7 @@ static int nfp_netvf_pci_probe(struct pci_dev *pdev,
u32 tx_bar_sz, rx_bar_sz;
int tx_bar_no, rx_bar_no;
struct nfp_net_vf *vf;
- unsigned int num_irqs;
+ int num_irqs;
u8 __iomem *ctrl_bar;
struct nfp_net *nn;
u32 startq;
@@ -255,9 +255,9 @@ static int nfp_netvf_pci_probe(struct pci_dev *pdev,
NFP_NET_MIN_VNIC_IRQS,
NFP_NET_NON_Q_VECTORS +
nn->dp.num_r_vecs);
- if (!num_irqs) {
- nn_warn(nn, "Unable to allocate MSI-X Vectors. Exiting\n");
- err = -EIO;
+ if (num_irqs < 0) {
+ nn_warn(nn, "Unable to allocate MSI-X Vectors. Exiting (err=%d)\n", num_irqs);
+ err = num_irqs;
goto err_unmap_rx;
}
nfp_net_irqs_assign(nn, vf->irq_entries, num_irqs);
--
2.34.1
Hej min kära,
Jag är ledsen att jag stör dig och inkräktar på din integritet. Jag är
singel, ensam och i behov av en omtänksam, kärleksfull och romantisk
följeslagare.
Jag är en hemlig beundrare och skulle vilja utforska möjligheten att
lära mig mer om varandra. Jag vet att det är konstigt att kontakta dig
på det här sättet och jag hoppas att du kan förlåta mig. Jag är en blyg
person och det är det enda sättet jag vet att jag kan få din
uppmärksamhet. Jag vill bara veta vad du tycker och min avsikt är inte
att förolämpa dig. Jag hoppas att vi kan vara vänner om det är vad du
vill, även om jag vill vara mer än bara en vän. Jag vet att du har några
frågor att ställa och jag hoppas att jag kan tillfredsställa en del av
din nyfikenhet med några svar.
Jag tror på talesättet att för världen är du bara en person, men för
någon speciell är du världen, allt jag vill ha är kärlek, romantisk
omsorg och uppmärksamhet från en speciell följeslagare som jag hoppas
skulle vara du.
Jag hoppas att detta meddelande kommer att bli början på en långsiktig
kommunikation mellan oss, skicka bara ett svar på detta meddelande, det
kommer att göra mig glad.
Puss och kram,
Marion.
Mi dispiace disturbarti e invadere la tua privacy. Sono single,
solitario e bisognoso di un compagno premuroso, amorevole e romantico.
Sono un ammiratore segreto e vorrei esplorare l'opportunità di farlo
saperne di più l'uno sull'altro. So che è strano contattarti
in questo modo e spero che tu possa perdonarmi. Sono una persona timida e
questo è l'unico modo in cui so di poter attirare la tua attenzione. Voglio semplicemente
per sapere cosa ne pensate e la mia intenzione non è di offendervi.
Spero che possiamo essere amici se è quello che vuoi, anche se lo vorrei
essere più di un semplice amico. So che hai alcune domande da fare
chiedi e spero di poter soddisfare alcune delle tue curiosità con alcuni
risposte.
Credo nel detto che "per il mondo sei solo una persona,
ma per qualcuno di speciale tu sei il mondo'. Tutto quello che voglio è amore,
cure e attenzioni romantiche da una compagna speciale quale sono io
sperando saresti tu.
Spero che questo messaggio sia l'inizio di un lungo periodo
comunicazione tra di noi, è sufficiente inviare una risposta a questo messaggio, it
mi renderà felice.
Baci e abbracci,
Marion.
> This is for pre-6.4 kernels, as scrub code goes through a huge rework.
>
> [BUG]
> Even before the scrub rework, if we have some corrupted metadata failed
> to be repaired during replace, we still continue replace and let it
> finish just as there is nothing wrong:
>
> BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started
> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1
> BTRFS warning (device dm-4): tree block 5578752 mirror 0 has bad csum, has 0x00000000 want 0xade80ca1
> BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5
> BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5
> BTRFS error (device dm-4): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad bytenr, has 0 want 5578752
> BTRFS error (device dm-4): unable to fixup (regular) error at logical 5578752 on dev /dev/mapper/test-scratch1
> BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 finished
>
> This can lead to unexpected problems for the result fs.
>
> [CAUSE]
> Btrfs reuses scrub code path for dev-replace to iterate all dev extents.
>
> But unlike scrub, dev-replace doesn't really bother to check the scrub
> progress, which records all the errors found during replace.
>
> And even if we checks the progress, we can not really determine which
> errors are minor, which are critical just by the plain numbers.
> (remember we don't treat metadata/data checksum error differently).
>
> This behavior is there from the very beginning.
>
> [FIX]
> Instead of continue the replace, just error out if we hit an unrepaired
> metadata sector.
>
> Now the dev-replace would be rejected with -EIO, to inform the user.
> Although it also means, the fs has some metadata error which can not be
> repaired, the user would be super upset anyway.
If one sector is bad in metadata how much secondary data is damaged?
As someone who recently had to remove, and currently replace a disk.
it is upsetting, if it stopped if we stopped because 0,01% of data is
unrepairable, if we can save the other 99,99%. Can we have it
continue, print an error message to standard out, and a way for the
user to delete or copy it (with som option like -force-delete or
--force-copy) with the error to the new disk?
"Metadata at block 5578752 is damaged and unrepaired. Skipping. Read
`man btrfs-replace` for more info. "
At the end if possible, list files affected by the damaged metadata blocks.
In man answer:
How can the user know what files are connected to the metadata?
How can a user decide what to do with the damaged metadata?
At minimum, can there be some useful info to the info to the error output? like
"Replace has stopped, due to reading unrepaired metadata block, was
working on block 5578752, se `dmesg` for more details. (\s Sorry but
you are currently f..k)"
>
> The new dmesg would look like this:
>
> BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started
> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1
> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1
> BTRFS error (device dm-4): unable to fixup (regular) error at logical 5570560 on dev /dev/mapper/test-scratch1 physical 5570560
> BTRFS warning (device dm-4): header error at logical 5570560 on dev /dev/mapper/test-scratch1, physical 5570560: metadata leaf (level 0) in tree 5
> BTRFS warning (device dm-4): header error at logical 5570560 on dev /dev/mapper/test-scratch1, physical 5570560: metadata leaf (level 0) in tree 5
> BTRFS error (device dm-4): stripe 5570560 has unrepaired metadata sector at 5578752
> BTRFS error (device dm-4): btrfs_scrub_dev(/dev/mapper/test-scratch1, 1, /dev/mapper/test-scratch2) failed -5
>
> CC: stable(a)vger.kernel.org
> Signed-off-by: Qu Wenruo <wqu(a)suse.com>
> ---
> I'm not sure how should we merge this patch.
>
> The misc-next is already merging the new scrub code, but the problem is
> there for all old kernels thus we need such fixes.
>
> Maybe we can merge this fix before the scrub rework, then the rework,
> and finally the better fix using reworked interface?
> ---
> fs/btrfs/scrub.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index ef4046a2572c..71f64b9bcd9f 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -195,6 +195,7 @@ struct scrub_ctx {
> struct mutex wr_lock;
> struct btrfs_device *wr_tgtdev;
> bool flush_all_writes;
> + bool has_meta_failed;
>
> /*
> * statistics
> @@ -1380,6 +1381,8 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
> btrfs_err_rl_in_rcu(fs_info,
> "unable to fixup (regular) error at logical %llu on dev %s",
> logical, btrfs_dev_name(dev));
> + if (is_metadata)
> + sctx->has_meta_failed = true;
> }
>
> out:
> @@ -3838,6 +3841,12 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
>
> blk_finish_plug(&plug);
>
> + /*
> + * If we have metadata unable to be repaired, we should error
> + * out the dev-replace.
> + */
> + if (sctx->is_dev_replace && sctx->has_meta_failed && ret >= 0)
> + ret = -EIO;
> if (sctx->is_dev_replace && ret >= 0) {
> int ret2;
>
--
Torstein Eide
Torsteine(a)gmail.com
I sent an email to you yesterday but since I did not get a response,
I thought probably you did not receive it so I decided to send it
again and hopefully I will get a response this time around.
I am a secret admirer and would like to explore the opportunity to
learn more about each other. I know it is strange to contact you
this way and I hope you can forgive me. I am a shy person and
this is the only way I know I could get your attention. I just want
to know what you think and my intention is not to offend you.
I hope we can be friends if that is what you want, although I wish
to be more than just a friend. I know you have a few questions to
ask and I hope I can satisfy some of your curiosity with a few
answers.
I believe in the saying that 'to the world you are just one person,
but to someone special you are the world'. All I want is love,
romantic care and attention from a special companion which I am
hoping would be you.
I hope this message will be the beginning of a long term
communication between us, simply send a reply to this message, it
will make me happy.
Hugs and kisses,
Marion.
The patch titled
Subject: relayfs: fix out-of-bounds access in relay_file_read
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
relayfs-fix-out-of-bounds-access-in-relay_file_read.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Zhang Zhengming <zhang.zhengming(a)h3c.com>
Subject: relayfs: fix out-of-bounds access in relay_file_read
Date: Wed, 19 Apr 2023 12:02:03 +0800
There is a crash in relay_file_read, as the var from
point to the end of last subbuf.
The oops looks something like:
pc : __arch_copy_to_user+0x180/0x310
lr : relay_file_read+0x20c/0x2c8
Call trace:
__arch_copy_to_user+0x180/0x310
full_proxy_read+0x68/0x98
vfs_read+0xb0/0x1d0
ksys_read+0x6c/0xf0
__arm64_sys_read+0x20/0x28
el0_svc_common.constprop.3+0x84/0x108
do_el0_svc+0x74/0x90
el0_svc+0x1c/0x28
el0_sync_handler+0x88/0xb0
el0_sync+0x148/0x180
We get the condition by analyzing the vmcore:
1). The last produced byte and last consumed byte
both at the end of the last subbuf
2). A softirq calls function(e.g __blk_add_trace)
to write relay buffer occurs when an program is calling
relay_file_read_avail().
relay_file_read
relay_file_read_avail
relay_file_read_consume(buf, 0, 0);
//interrupted by softirq who will write subbuf
....
return 1;
//read_start point to the end of the last subbuf
read_start = relay_file_read_start_pos
//avail is equal to subsize
avail = relay_file_read_subbuf_avail
//from points to an invalid memory address
from = buf->start + read_start
//system is crashed
copy_to_user(buffer, from, avail)
Link: https://lkml.kernel.org/r/20230419040203.37676-1-zhang.zhengming@h3c.com
Fixes: 341a7213e5c1 ("kernel/relay.c: fix read_pos error when multiple readers")
Signed-off-by: Zhang Zhengming <zhang.zhengming(a)h3c.com>
Reviewed-by: Zhao Lei <zhao_lei1(a)hoperun.com>
Reviewed-by: Zhou Kete <zhou.kete(a)h3c.com>
Cc: Pengcheng Yang <yangpc(a)wangsu.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/relay.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/relay.c~relayfs-fix-out-of-bounds-access-in-relay_file_read
+++ a/kernel/relay.c
@@ -989,7 +989,8 @@ static size_t relay_file_read_start_pos(
size_t subbuf_size = buf->chan->subbuf_size;
size_t n_subbufs = buf->chan->n_subbufs;
size_t consumed = buf->subbufs_consumed % n_subbufs;
- size_t read_pos = consumed * subbuf_size + buf->bytes_consumed;
+ size_t read_pos = (consumed * subbuf_size + buf->bytes_consumed)
+ % (n_subbufs * subbuf_size);
read_subbuf = read_pos / subbuf_size;
padding = buf->padding[read_subbuf];
_
Patches currently in -mm which might be from zhang.zhengming(a)h3c.com are
relayfs-fix-out-of-bounds-access-in-relay_file_read.patch
The patch titled
Subject: maple_tree: make maple state reusable after mas_empty_area_rev()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
maple_tree-make-maple-state-reusable-after-mas_empty_area_rev.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: maple_tree: make maple state reusable after mas_empty_area_rev()
Date: Fri, 14 Apr 2023 10:57:26 -0400
Stop using maple state min/max for the range by passing through pointers
for those values. This will allow the maple state to be reused without
resetting.
Also add some logic to fail out early on searching with invalid
arguments.
Link: https://lkml.kernel.org/r/20230414145728.4067069-1-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Cc: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 27 +++++++++++++--------------
1 file changed, 13 insertions(+), 14 deletions(-)
--- a/lib/maple_tree.c~maple_tree-make-maple-state-reusable-after-mas_empty_area_rev
+++ a/lib/maple_tree.c
@@ -4965,7 +4965,8 @@ not_found:
* Return: True if found in a leaf, false otherwise.
*
*/
-static bool mas_rev_awalk(struct ma_state *mas, unsigned long size)
+static bool mas_rev_awalk(struct ma_state *mas, unsigned long size,
+ unsigned long *gap_min, unsigned long *gap_max)
{
enum maple_type type = mte_node_type(mas->node);
struct maple_node *node = mas_mn(mas);
@@ -5030,8 +5031,8 @@ static bool mas_rev_awalk(struct ma_stat
if (unlikely(ma_is_leaf(type))) {
mas->offset = offset;
- mas->min = min;
- mas->max = min + gap - 1;
+ *gap_min = min;
+ *gap_max = min + gap - 1;
return true;
}
@@ -5309,6 +5310,9 @@ int mas_empty_area(struct ma_state *mas,
unsigned long *pivots;
enum maple_type mt;
+ if (min >= max)
+ return -EINVAL;
+
if (mas_is_start(mas))
mas_start(mas);
else if (mas->offset >= 2)
@@ -5363,6 +5367,9 @@ int mas_empty_area_rev(struct ma_state *
{
struct maple_enode *last = mas->node;
+ if (min >= max)
+ return -EINVAL;
+
if (mas_is_start(mas)) {
mas_start(mas);
mas->offset = mas_data_end(mas);
@@ -5382,7 +5389,7 @@ int mas_empty_area_rev(struct ma_state *
mas->index = min;
mas->last = max;
- while (!mas_rev_awalk(mas, size)) {
+ while (!mas_rev_awalk(mas, size, &min, &max)) {
if (last == mas->node) {
if (!mas_rewind_node(mas))
return -EBUSY;
@@ -5397,17 +5404,9 @@ int mas_empty_area_rev(struct ma_state *
if (unlikely(mas->offset == MAPLE_NODE_SLOTS))
return -EBUSY;
- /*
- * mas_rev_awalk() has set mas->min and mas->max to the gap values. If
- * the maximum is outside the window we are searching, then use the last
- * location in the search.
- * mas->max and mas->min is the range of the gap.
- * mas->index and mas->last are currently set to the search range.
- */
-
/* Trim the upper limit to the max. */
- if (mas->max <= mas->last)
- mas->last = mas->max;
+ if (max <= mas->last)
+ mas->last = max;
mas->index = mas->last - size + 1;
return 0;
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-make-maple-state-reusable-after-mas_empty_area_rev.patch
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
The skl+ scalers only sample 12 bits of PIPESRC so we can't
do any plane scaling at all when the pipe source size is >4k.
Make sure the pipe source size is also below the scaler's src
size limits. Might not be 100% accurate, but should at least be
safe. We can refine the limits later if we discover that recent
hw is less restricted.
Cc: stable(a)vger.kernel.org
Tested-by: Ross Zwisler <zwisler(a)google.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8357
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/skl_scaler.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/skl_scaler.c b/drivers/gpu/drm/i915/display/skl_scaler.c
index 473d53610b92..0e7e014fcc71 100644
--- a/drivers/gpu/drm/i915/display/skl_scaler.c
+++ b/drivers/gpu/drm/i915/display/skl_scaler.c
@@ -111,6 +111,8 @@ skl_update_scaler(struct intel_crtc_state *crtc_state, bool force_detach,
struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
const struct drm_display_mode *adjusted_mode =
&crtc_state->hw.adjusted_mode;
+ int pipe_src_w = drm_rect_width(&crtc_state->pipe_src);
+ int pipe_src_h = drm_rect_height(&crtc_state->pipe_src);
int min_src_w, min_src_h, min_dst_w, min_dst_h;
int max_src_w, max_src_h, max_dst_w, max_dst_h;
@@ -207,6 +209,21 @@ skl_update_scaler(struct intel_crtc_state *crtc_state, bool force_detach,
return -EINVAL;
}
+ /*
+ * The pipe scaler does not use all the bits of PIPESRC, at least
+ * on the earlier platforms. So even when we're scaling a plane
+ * the *pipe* source size must not be too large. For simplicity
+ * we assume the limits match the scaler source size limits. Might
+ * not be 100% accurate on all platforms, but good enough for now.
+ */
+ if (pipe_src_w > max_src_w || pipe_src_h > max_src_h) {
+ drm_dbg_kms(&dev_priv->drm,
+ "scaler_user index %u.%u: pipe src size %ux%u "
+ "is out of scaler range\n",
+ crtc->pipe, scaler_user, pipe_src_w, pipe_src_h);
+ return -EINVAL;
+ }
+
/* mark this plane as a scaler user in crtc_state */
scaler_state->scaler_users |= (1 << scaler_user);
drm_dbg_kms(&dev_priv->drm, "scaler_user index %u.%u: "
--
2.39.2
This is the start of the stable review cycle for the 4.19.281 release.
There are 57 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.281-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.281-rc1
Marc Zyngier <marc.zyngier(a)arm.com>
arm64: KVM: Fix system register enumeration
Dave Martin <Dave.Martin(a)arm.com>
KVM: arm64: Filter out invalid core register IDs in KVM_GET_REG_LIST
Dave Martin <Dave.Martin(a)arm.com>
KVM: arm64: Factor out core register ID enumeration
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: nVMX: add missing consistency checks for CR0 and CR4
Steve Clevenger <scclevenger(a)os.amperecomputing.com>
coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
George Cherian <george.cherian(a)marvell.com>
watchdog: sbsa_wdog: Make sure the timeout programming is within the limits
Waiman Long <longman(a)redhat.com>
cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
ZhaoLong Wang <wangzhaolong1(a)huawei.com>
ubi: Fix deadlock caused by recursively holding work_sem
Lee Jones <lee.jones(a)linaro.org>
mtd: ubi: wl: Fix a couple of kernel-doc issues
Zhihao Cheng <chengzhihao1(a)huawei.com>
ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
Basavaraj Natikar <Basavaraj.Natikar(a)amd.com>
x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
Jiri Kosina <jkosina(a)suse.cz>
scsi: ses: Handle enclosure with just a primary component gracefully
Robbie Harwood <rharwood(a)redhat.com>
verify_pefile: relax wrapper length check
Hans de Goede <hdegoede(a)redhat.com>
efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L
Alexander Stein <alexander.stein(a)ew.tq-group.com>
i2c: imx-lpi2c: clean rx/tx buffers upon new message
Grant Grundler <grundler(a)chromium.org>
power: supply: cros_usbpd: reclassify "default case!" as debug
Eric Dumazet <edumazet(a)google.com>
udp6: fix potential access to stale information
Roman Gushchin <roman.gushchin(a)linux.dev>
net: macb: fix a memory corruption in extended buffer descriptor mode
Xin Long <lucien.xin(a)gmail.com>
sctp: fix a potential overflow in sctp_ifwdtsn_skip
Denis Plotnikov <den-plotnikov(a)yandex-team.ru>
qlcnic: check pci_reset_function result
Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
niu: Fix missing unwind goto in niu_alloc_channels()
Zheng Wang <zyytlz.wz(a)163.com>
9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition
Bang Li <libang.linuxer(a)gmail.com>
mtdblock: tolerate corrected bit-flips
Min Li <lm0963hack(a)gmail.com>
Bluetooth: Fix race condition in hidp_session_thread
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: i2c/cs8427: fix iec958 mixer control deactivation
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
ALSA: emu10k1: fix capture interrupt handler unlinking
Kornel Dulęba <korneld(a)chromium.org>
Revert "pinctrl: amd: Disable and mask interrupts on resume"
Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
Zheng Yejian <zhengyejian1(a)huawei.com>
ring-buffer: Fix race while reader and writer are on the same page
John Keeping <john(a)metanate.com>
ftrace: Mark get_lock_parent_ip() __always_inline
Kan Liang <kan.liang(a)linux.intel.com>
perf/core: Fix the same task check in perf_event_set_output
Jeremy Soller <jeremy(a)system76.com>
ALSA: hda/realtek: Add quirk for Clevo X370SNW
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix sysfs interface lifetime
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
Biju Das <biju.das.jz(a)bp.renesas.com>
tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
Biju Das <biju.das.jz(a)bp.renesas.com>
tty: serial: sh-sci: Fix transmit end interrupt handler
William Breathitt Gray <william.gray(a)linaro.org>
iio: dac: cio-dac: Fix max DAC write value check for 12-bit
Bjørn Mork <bjorn(a)mork.no>
USB: serial: option: add Quectel RM500U-CN modem
Enrico Sau <enrico.sau(a)gmail.com>
USB: serial: option: add Telit FE990 compositions
Kees Jan Koster <kjkoster(a)kjkoster.org>
USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
Dhruva Gole <d-gole(a)ti.com>
gpio: davinci: Add irq chip flag to skip set wake
Ziyang Xuan <william.xuanziyang(a)huawei.com>
ipv6: Fix an uninit variable access bug in __ip6_make_skb()
Xin Long <lucien.xin(a)gmail.com>
sctp: check send stream number after wait_for_sndbuf
Jakub Kicinski <kuba(a)kernel.org>
net: don't let netpoll invoke NAPI if in xmit context
Eric Dumazet <edumazet(a)google.com>
icmp: guard against too small mtu
Felix Fietkau <nbd(a)nbd.name>
wifi: mac80211: fix invalid drv_sta_pre_rcu_remove calls for non-uploaded sta
Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
pwm: cros-ec: Explicitly set .polarity in .get_state()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFSv4: Fix hangs when recovering open state after a server reboot
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFSv4: Check the return value of update_open_stateid()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFSv4: Convert struct nfs4_state to use refcount_t
Kornel Dulęba <korneld(a)chromium.org>
pinctrl: amd: Disable and mask interrupts on resume
Sachi King <nakato(a)nakato.io>
pinctrl: amd: disable and mask interrupts on probe
Linus Walleij <linus.walleij(a)linaro.org>
pinctrl: amd: Use irqchip template
Sandeep Singh <sandeep.singh(a)amd.com>
pinctrl: Added IRQF_SHARED flag for amd-pinctrl driver
-------------
Diffstat:
Documentation/sound/hd-audio/models.rst | 2 +-
Makefile | 4 +-
arch/arm64/kvm/guest.c | 83 ++++++++++++++++++++-----
arch/x86/kernel/sysfb_efi.c | 8 +++
arch/x86/kvm/vmx/vmx.c | 10 ++-
arch/x86/pci/fixup.c | 21 +++++++
crypto/asymmetric_keys/verify_pefile.c | 12 ++--
drivers/gpio/gpio-davinci.c | 2 +-
drivers/hwtracing/coresight/coresight-etm4x.c | 2 +-
drivers/i2c/busses/i2c-imx-lpi2c.c | 2 +
drivers/iio/dac/cio-dac.c | 4 +-
drivers/mtd/mtdblock.c | 12 ++--
drivers/mtd/ubi/build.c | 21 +++++--
drivers/mtd/ubi/wl.c | 5 +-
drivers/net/ethernet/cadence/macb_main.c | 4 ++
drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c | 8 ++-
drivers/net/ethernet/sun/niu.c | 2 +-
drivers/pinctrl/pinctrl-amd.c | 56 +++++++++++++----
drivers/power/supply/cros_usbpd-charger.c | 2 +-
drivers/pwm/pwm-cros-ec.c | 1 +
drivers/scsi/ses.c | 20 +++---
drivers/tty/serial/sh-sci.c | 9 ++-
drivers/usb/serial/cp210x.c | 1 +
drivers/usb/serial/option.c | 10 +++
drivers/watchdog/sbsa_gwdt.c | 1 +
fs/nfs/nfs4_fs.h | 2 +-
fs/nfs/nfs4proc.c | 25 ++++----
fs/nfs/nfs4state.c | 8 +--
fs/nilfs2/segment.c | 3 +-
fs/nilfs2/super.c | 2 +
fs/nilfs2/the_nilfs.c | 12 ++--
include/linux/ftrace.h | 2 +-
kernel/cgroup/cpuset.c | 4 +-
kernel/events/core.c | 2 +-
kernel/trace/ring_buffer.c | 13 +++-
mm/swapfile.c | 3 +-
net/9p/trans_xen.c | 4 ++
net/bluetooth/hidp/core.c | 2 +-
net/bluetooth/l2cap_core.c | 24 ++-----
net/core/netpoll.c | 19 +++++-
net/ipv4/icmp.c | 5 ++
net/ipv6/ip6_output.c | 7 ++-
net/ipv6/udp.c | 8 ++-
net/mac80211/sta_info.c | 3 +-
net/sctp/socket.c | 4 ++
net/sctp/stream_interleave.c | 3 +-
sound/i2c/cs8427.c | 7 ++-
sound/pci/emu10k1/emupcm.c | 4 +-
sound/pci/hda/patch_realtek.c | 1 +
sound/pci/hda/patch_sigmatel.c | 10 +++
50 files changed, 350 insertions(+), 129 deletions(-)