The reset GPIO was marked active-high, which is against what's specified
in the documentation. Mark the reset GPIO as active-low. With this
change, Bluetooth can now be used on the i9100.
Fixes: 8620cc2f99b7 ("ARM: dts: exynos: Add devicetree file for the Galaxy S2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Paul Cercueil <paul(a)crapouillou.net>
---
arch/arm/boot/dts/exynos4210-i9100.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/exynos4210-i9100.dts b/arch/arm/boot/dts/exynos4210-i9100.dts
index 55922176807e..5f5d9b135736 100644
--- a/arch/arm/boot/dts/exynos4210-i9100.dts
+++ b/arch/arm/boot/dts/exynos4210-i9100.dts
@@ -827,7 +827,7 @@ bluetooth {
compatible = "brcm,bcm4330-bt";
shutdown-gpios = <&gpl0 4 GPIO_ACTIVE_HIGH>;
- reset-gpios = <&gpl1 0 GPIO_ACTIVE_HIGH>;
+ reset-gpios = <&gpl1 0 GPIO_ACTIVE_LOW>;
device-wakeup-gpios = <&gpx3 1 GPIO_ACTIVE_HIGH>;
host-wakeup-gpios = <&gpx2 6 GPIO_ACTIVE_HIGH>;
};
--
2.33.0
When using performance policy, EPP value is restored to non "performance"
mode EPP after offline and online.
For example:
cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_preference
performance
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online
cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_preference
balance_performance
The commit 4adcf2e5829f ("cpufreq: intel_pstate: Add ->offline and ->online callbacks")
optimized save restore path of the HWP request MSR, when there is no
change in the policy. Also added special processing for performance mode
EPP. If EPP has been set to "performance" by the active mode "performance"
scaling algorithm, replace that value with the cached EPP. This ends up
replacing with cached EPP during offline, which is restored during online
again.
So add a change which will set cpu_data->epp_policy to zero, when in
performance policy and has non zero epp. In this way EPP is set to zero
again.
Fixes: 4adcf2e5829f ("cpufreq: intel_pstate: Add ->offline and ->online callbacks")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Cc: stable(a)vger.kernel.org # v5.9+
---
drivers/cpufreq/intel_pstate.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 815df3daae9d..49ff24d2b0ea 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -930,17 +930,23 @@ static void intel_pstate_hwp_set(unsigned int cpu)
{
struct cpudata *cpu_data = all_cpu_data[cpu];
int max, min;
+ s16 epp = 0;
u64 value;
- s16 epp;
max = cpu_data->max_perf_ratio;
min = cpu_data->min_perf_ratio;
- if (cpu_data->policy == CPUFREQ_POLICY_PERFORMANCE)
- min = max;
-
rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, &value);
+ if (boot_cpu_has(X86_FEATURE_HWP_EPP))
+ epp = (value >> 24) & 0xff;
+
+ if (cpu_data->policy == CPUFREQ_POLICY_PERFORMANCE) {
+ min = max;
+ if (epp)
+ cpu_data->epp_policy = 0;
+ }
+
value &= ~HWP_MIN_PERF(~0L);
value |= HWP_MIN_PERF(min);
--
2.17.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3735459037114d31e5acd9894fad9aed104231a0 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 9 Nov 2021 14:53:57 +0100
Subject: [PATCH] PCI/MSI: Destroy sysfs before freeing entries
free_msi_irqs() frees the MSI entries before destroying the sysfs entries
which are exposing them. Nothing prevents a concurrent free while a sysfs
file is read and accesses the possibly freed entry.
Move the sysfs release ahead of freeing the entries.
Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Bjorn Helgaas <helgaas(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87sfw5305m.ffs@tglx
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 70433013897b..48e3f4e47b29 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -368,6 +368,11 @@ static void free_msi_irqs(struct pci_dev *dev)
for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));
+ if (dev->msi_irq_groups) {
+ msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
+ dev->msi_irq_groups = NULL;
+ }
+
pci_msi_teardown_msi_irqs(dev);
list_for_each_entry_safe(entry, tmp, msi_list, list) {
@@ -379,11 +384,6 @@ static void free_msi_irqs(struct pci_dev *dev)
list_del(&entry->list);
free_msi_entry(entry);
}
-
- if (dev->msi_irq_groups) {
- msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
- dev->msi_irq_groups = NULL;
- }
}
static void pci_intx_for_msi(struct pci_dev *dev, int enable)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3735459037114d31e5acd9894fad9aed104231a0 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 9 Nov 2021 14:53:57 +0100
Subject: [PATCH] PCI/MSI: Destroy sysfs before freeing entries
free_msi_irqs() frees the MSI entries before destroying the sysfs entries
which are exposing them. Nothing prevents a concurrent free while a sysfs
file is read and accesses the possibly freed entry.
Move the sysfs release ahead of freeing the entries.
Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Bjorn Helgaas <helgaas(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87sfw5305m.ffs@tglx
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 70433013897b..48e3f4e47b29 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -368,6 +368,11 @@ static void free_msi_irqs(struct pci_dev *dev)
for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));
+ if (dev->msi_irq_groups) {
+ msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
+ dev->msi_irq_groups = NULL;
+ }
+
pci_msi_teardown_msi_irqs(dev);
list_for_each_entry_safe(entry, tmp, msi_list, list) {
@@ -379,11 +384,6 @@ static void free_msi_irqs(struct pci_dev *dev)
list_del(&entry->list);
free_msi_entry(entry);
}
-
- if (dev->msi_irq_groups) {
- msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
- dev->msi_irq_groups = NULL;
- }
}
static void pci_intx_for_msi(struct pci_dev *dev, int enable)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3735459037114d31e5acd9894fad9aed104231a0 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 9 Nov 2021 14:53:57 +0100
Subject: [PATCH] PCI/MSI: Destroy sysfs before freeing entries
free_msi_irqs() frees the MSI entries before destroying the sysfs entries
which are exposing them. Nothing prevents a concurrent free while a sysfs
file is read and accesses the possibly freed entry.
Move the sysfs release ahead of freeing the entries.
Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Bjorn Helgaas <helgaas(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87sfw5305m.ffs@tglx
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 70433013897b..48e3f4e47b29 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -368,6 +368,11 @@ static void free_msi_irqs(struct pci_dev *dev)
for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));
+ if (dev->msi_irq_groups) {
+ msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
+ dev->msi_irq_groups = NULL;
+ }
+
pci_msi_teardown_msi_irqs(dev);
list_for_each_entry_safe(entry, tmp, msi_list, list) {
@@ -379,11 +384,6 @@ static void free_msi_irqs(struct pci_dev *dev)
list_del(&entry->list);
free_msi_entry(entry);
}
-
- if (dev->msi_irq_groups) {
- msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
- dev->msi_irq_groups = NULL;
- }
}
static void pci_intx_for_msi(struct pci_dev *dev, int enable)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3735459037114d31e5acd9894fad9aed104231a0 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 9 Nov 2021 14:53:57 +0100
Subject: [PATCH] PCI/MSI: Destroy sysfs before freeing entries
free_msi_irqs() frees the MSI entries before destroying the sysfs entries
which are exposing them. Nothing prevents a concurrent free while a sysfs
file is read and accesses the possibly freed entry.
Move the sysfs release ahead of freeing the entries.
Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Bjorn Helgaas <helgaas(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87sfw5305m.ffs@tglx
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 70433013897b..48e3f4e47b29 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -368,6 +368,11 @@ static void free_msi_irqs(struct pci_dev *dev)
for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));
+ if (dev->msi_irq_groups) {
+ msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
+ dev->msi_irq_groups = NULL;
+ }
+
pci_msi_teardown_msi_irqs(dev);
list_for_each_entry_safe(entry, tmp, msi_list, list) {
@@ -379,11 +384,6 @@ static void free_msi_irqs(struct pci_dev *dev)
list_del(&entry->list);
free_msi_entry(entry);
}
-
- if (dev->msi_irq_groups) {
- msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
- dev->msi_irq_groups = NULL;
- }
}
static void pci_intx_for_msi(struct pci_dev *dev, int enable)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3735459037114d31e5acd9894fad9aed104231a0 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 9 Nov 2021 14:53:57 +0100
Subject: [PATCH] PCI/MSI: Destroy sysfs before freeing entries
free_msi_irqs() frees the MSI entries before destroying the sysfs entries
which are exposing them. Nothing prevents a concurrent free while a sysfs
file is read and accesses the possibly freed entry.
Move the sysfs release ahead of freeing the entries.
Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Bjorn Helgaas <helgaas(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87sfw5305m.ffs@tglx
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 70433013897b..48e3f4e47b29 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -368,6 +368,11 @@ static void free_msi_irqs(struct pci_dev *dev)
for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));
+ if (dev->msi_irq_groups) {
+ msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
+ dev->msi_irq_groups = NULL;
+ }
+
pci_msi_teardown_msi_irqs(dev);
list_for_each_entry_safe(entry, tmp, msi_list, list) {
@@ -379,11 +384,6 @@ static void free_msi_irqs(struct pci_dev *dev)
list_del(&entry->list);
free_msi_entry(entry);
}
-
- if (dev->msi_irq_groups) {
- msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
- dev->msi_irq_groups = NULL;
- }
}
static void pci_intx_for_msi(struct pci_dev *dev, int enable)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a923a2676e60683aee46aa4b93c30aff240ac20d Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Fri, 22 Oct 2021 00:58:23 +0200
Subject: [PATCH] MIPS: Fix assembly error from MIPSr2 code used within
MIPS_ISA_ARCH_LEVEL
Fix assembly errors like:
{standard input}: Assembler messages:
{standard input}:287: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32'
{standard input}:680: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32'
{standard input}:1274: Error: opcode not supported on this processor: mips3 (mips3) `dins $12,$9,32,32'
{standard input}:2175: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32'
make[1]: *** [scripts/Makefile.build:277: mm/highmem.o] Error 1
with code produced from `__cmpxchg64' for MIPS64r2 CPU configurations
using CONFIG_32BIT and CONFIG_PHYS_ADDR_T_64BIT.
This is due to MIPS_ISA_ARCH_LEVEL downgrading the assembly architecture
to `r4000' i.e. MIPS III for MIPS64r2 configurations, while there is a
block of code containing a DINS MIPS64r2 instruction conditionalized on
MIPS_ISA_REV >= 2 within the scope of the downgrade.
The assembly architecture override code pattern has been put there for
LL/SC instructions, so that code compiles for configurations that select
a processor to build for that does not support these instructions while
still providing run-time support for processors that do, dynamically
switched by non-constant `cpu_has_llsc'. It went in with linux-mips.org
commit aac8aa7717a2 ("Enable a suitable ISA for the assembler around
ll/sc so that code builds even for processors that don't support the
instructions. Plus minor formatting fixes.") back in 2005.
Fix the problem by wrapping these instructions along with the adjacent
SYNC instructions only, following the practice established with commit
cfd54de3b0e4 ("MIPS: Avoid move psuedo-instruction whilst using
MIPS_ISA_LEVEL") and commit 378ed6f0e3c5 ("MIPS: Avoid using .set mips0
to restore ISA"). Strictly speaking the SYNC instructions do not have
to be wrapped as they are only used as a Loongson3 erratum workaround,
so they will be enabled in the assembler by default, but do this so as
to keep code consistent with other places.
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Fixes: c7e2d71dda7a ("MIPS: Fix set_pte() for Netlogic XLR using cmpxchg64()")
Cc: stable(a)vger.kernel.org # v5.1+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index 0b983800f48b..66a8b293fd80 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -249,6 +249,7 @@ static inline unsigned long __cmpxchg64(volatile void *ptr,
/* Load 64 bits from ptr */
" " __SYNC(full, loongson3_war) " \n"
"1: lld %L0, %3 # __cmpxchg64 \n"
+ " .set pop \n"
/*
* Split the 64 bit value we loaded into the 2 registers that hold the
* ret variable.
@@ -276,12 +277,14 @@ static inline unsigned long __cmpxchg64(volatile void *ptr,
" or %L1, %L1, $at \n"
" .set at \n"
# endif
+ " .set push \n"
+ " .set " MIPS_ISA_ARCH_LEVEL " \n"
/* Attempt to store new at ptr */
" scd %L1, %2 \n"
/* If we failed, loop! */
"\t" __SC_BEQZ "%L1, 1b \n"
- " .set pop \n"
"2: " __SYNC(full, loongson3_war) " \n"
+ " .set pop \n"
: "=&r"(ret),
"=&r"(tmp),
"=" GCC_OFF_SMALL_ASM() (*(unsigned long long *)ptr)
Please include "smb3: do not error on fsync when readonly"
Commit id:
71e6864eacbef0b2645ca043cdfbac272cb6cea3
It fixes an fsync problem over SMB3 mounts, reported by Julian, when
the file is not opened for write.
--
Thanks,
Steve
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 712a951025c0667ff00b25afc360f74e639dfabe Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Tue, 2 Nov 2021 11:10:37 +0100
Subject: [PATCH] fuse: fix page stealing
It is possible to trigger a crash by splicing anon pipe bufs to the fuse
device.
The reason for this is that anon_pipe_buf_release() will reuse buf->page if
the refcount is 1, but that page might have already been stolen and its
flags modified (e.g. PG_lru added).
This happens in the unlikely case of fuse_dev_splice_write() getting around
to calling pipe_buf_release() after a page has been stolen, added to the
page cache and removed from the page cache.
Fix by calling pipe_buf_release() right after the page was inserted into
the page cache. In this case the page has an elevated refcount so any
release function will know that the page isn't reusable.
Reported-by: Frank Dinoff <fdinoff(a)google.com>
Link: https://lore.kernel.org/r/CAAmZXrsGg2xsP1CK+cbuEMumtrqdvD-NKnWzhNcvn71RV3c1…
Fixes: dd3bb14f44a6 ("fuse: support splice() writing to fuse device")
Cc: <stable(a)vger.kernel.org> # v2.6.35
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 337fa6f5a7c6..79f7eda49e06 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -847,6 +847,12 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
replace_page_cache_page(oldpage, newpage);
+ /*
+ * Release while we have extra ref on stolen page. Otherwise
+ * anon_pipe_buf_release() might think the page can be reused.
+ */
+ pipe_buf_release(cs->pipe, buf);
+
get_page(newpage);
if (!(buf->flags & PIPE_BUF_FLAG_LRU))
@@ -2031,8 +2037,12 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
pipe_lock(pipe);
out_free:
- for (idx = 0; idx < nbuf; idx++)
- pipe_buf_release(pipe, &bufs[idx]);
+ for (idx = 0; idx < nbuf; idx++) {
+ struct pipe_buffer *buf = &bufs[idx];
+
+ if (buf->ops)
+ pipe_buf_release(pipe, buf);
+ }
pipe_unlock(pipe);
kvfree(bufs);
This is a backport of the remaining patches from the below series:
https://lore.kernel.org/all/cover.1633464148.git.naveen.n.rao@linux.vnet.ib…
Kindly apply to the longterm tree for v5.4
Thanks,
Naveen
Naveen N. Rao (5):
powerpc/lib: Add helper to check if offset is within conditional
branch range
powerpc/bpf: Validate branch ranges
powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
powerpc/security: Add a helper to query stf_barrier type
powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
arch/powerpc/include/asm/code-patching.h | 1 +
arch/powerpc/include/asm/security_features.h | 5 ++
arch/powerpc/kernel/security.c | 5 ++
arch/powerpc/lib/code-patching.c | 7 +-
arch/powerpc/net/bpf_jit.h | 33 ++++---
arch/powerpc/net/bpf_jit64.h | 8 +-
arch/powerpc/net/bpf_jit_comp64.c | 91 ++++++++++++++++----
7 files changed, 117 insertions(+), 33 deletions(-)
base-commit: c65356f0f7268b1260dd64415c2145e73640872e
--
2.33.1
This is a backport of the remaining patches from the below series:
https://lore.kernel.org/all/cover.1633464148.git.naveen.n.rao@linux.vnet.ib…
Kindly apply to the longterm tree for v4.19
Thanks,
Naveen
Naveen N. Rao (5):
powerpc/lib: Add helper to check if offset is within conditional
branch range
powerpc/bpf: Validate branch ranges
powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
powerpc/security: Add a helper to query stf_barrier type
powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
arch/powerpc/include/asm/code-patching.h | 1 +
arch/powerpc/include/asm/security_features.h | 5 ++
arch/powerpc/kernel/security.c | 5 ++
arch/powerpc/lib/code-patching.c | 7 +-
arch/powerpc/net/bpf_jit.h | 33 ++++---
arch/powerpc/net/bpf_jit64.h | 8 +-
arch/powerpc/net/bpf_jit_comp64.c | 93 ++++++++++++++++----
7 files changed, 118 insertions(+), 34 deletions(-)
base-commit: 3033e5726834e4c9c8c48cdb2273f33bd105f938
--
2.33.1
This is a backport of the remaining patches from the below series:
https://lore.kernel.org/all/cover.1633464148.git.naveen.n.rao@linux.vnet.ib…
Kindly apply to the longterm tree for v4.14
Thanks,
Naveen
Naveen N. Rao (3):
powerpc/lib: Add helper to check if offset is within conditional
branch range
powerpc/bpf: Validate branch ranges
powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
arch/powerpc/include/asm/code-patching.h | 1 +
arch/powerpc/lib/code-patching.c | 7 ++++-
arch/powerpc/net/bpf_jit.h | 33 +++++++++++++--------
arch/powerpc/net/bpf_jit_comp64.c | 37 +++++++++++++++---------
4 files changed, 52 insertions(+), 26 deletions(-)
base-commit: 0447aa205abe1c0c016b4f7fa9d7c08d920b5c8e
--
2.33.1
This is a backport of the remaining patches from the below series:
https://lore.kernel.org/all/cover.1633464148.git.naveen.n.rao@linux.vnet.ib…
Kindly apply to the longterm tree for v4.9
Thanks,
Naveen
Naveen N. Rao (2):
powerpc/bpf: Validate branch ranges
powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
arch/powerpc/net/bpf_jit.h | 25 ++++++++++++++++-----
arch/powerpc/net/bpf_jit_comp64.c | 37 ++++++++++++++++++++-----------
2 files changed, 43 insertions(+), 19 deletions(-)
base-commit: 59d4178b517472a1f023886baded2191458a76b5
--
2.33.1
This is a note to let you know that I've just added the patch titled
staging: r8188eu: Use kzalloc() with GFP_ATOMIC in atomic context
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From c15a059f85de49c542e6ec2464967dd2b2aa18f6 Mon Sep 17 00:00:00 2001
From: "Fabio M. De Francesco" <fmdefrancesco(a)gmail.com>
Date: Mon, 1 Nov 2021 20:18:47 +0100
Subject: staging: r8188eu: Use kzalloc() with GFP_ATOMIC in atomic context
Use the GFP_ATOMIC flag of kzalloc() with two memory allocation in
report_del_sta_event(). This function is called while holding spinlocks,
therefore it is not allowed to sleep. With the GFP_ATOMIC type flag, the
allocation is high priority and must not sleep.
This issue is detected by Smatch which emits the following warning:
"drivers/staging/r8188eu/core/rtw_mlme_ext.c:6848 report_del_sta_event()
warn: sleeping in atomic context".
After the change, the post-commit hook output the following message:
"CHECK: Prefer kzalloc(sizeof(*pcmd_obj)...) over
kzalloc(sizeof(struct cmd_obj)...)".
According to the above "CHECK", use the preferred style in the first
kzalloc().
Fixes: 79f712ea994d ("staging: r8188eu: Remove wrappers for kalloc() and kzalloc()")
Fixes: 15865124feed ("staging: r8188eu: introduce new core dir for RTL8188eu driver")
Signed-off-by: Fabio M. De Francesco <fmdefrancesco(a)gmail.com>
Link: https://lore.kernel.org/r/20211101191847.6749-1-fmdefrancesco@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: stable <stable(a)vger.kernel.org>
---
drivers/staging/r8188eu/core/rtw_mlme_ext.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/r8188eu/core/rtw_mlme_ext.c b/drivers/staging/r8188eu/core/rtw_mlme_ext.c
index 5b60e6df5f87..b4820ad2cee7 100644
--- a/drivers/staging/r8188eu/core/rtw_mlme_ext.c
+++ b/drivers/staging/r8188eu/core/rtw_mlme_ext.c
@@ -6847,12 +6847,12 @@ void report_del_sta_event(struct adapter *padapter, unsigned char *MacAddr, unsi
struct mlme_ext_priv *pmlmeext = &padapter->mlmeextpriv;
struct cmd_priv *pcmdpriv = &padapter->cmdpriv;
- pcmd_obj = kzalloc(sizeof(struct cmd_obj), GFP_KERNEL);
+ pcmd_obj = kzalloc(sizeof(*pcmd_obj), GFP_ATOMIC);
if (!pcmd_obj)
return;
cmdsz = (sizeof(struct stadel_event) + sizeof(struct C2HEvent_Header));
- pevtcmd = kzalloc(cmdsz, GFP_KERNEL);
+ pevtcmd = kzalloc(cmdsz, GFP_ATOMIC);
if (!pevtcmd) {
kfree(pcmd_obj);
return;
--
2.33.1
This is a note to let you know that I've just added the patch titled
staging: r8188eu: use GFP_ATOMIC under spinlock
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 4a293eaf92a510ff688dc7b3f0815221f99c9d1b Mon Sep 17 00:00:00 2001
From: Michael Straube <straube.linux(a)gmail.com>
Date: Mon, 8 Nov 2021 11:55:37 +0100
Subject: staging: r8188eu: use GFP_ATOMIC under spinlock
In function rtw_report_sec_ie() kzalloc() is called under a spinlock,
so the allocation have to be atomic.
Call tree:
-> rtw_select_and_join_from_scanned_queue() <- takes a spinlock
-> rtw_joinbss_cmd()
-> rtw_restruct_sec_ie()
-> rtw_report_sec_ie()
Fixes: 2b42bd58b321 ("staging: r8188eu: introduce new os_dep dir for RTL8188eu driver")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Michael Straube <straube.linux(a)gmail.com>
Link: https://lore.kernel.org/r/20211108105537.31655-1-straube.linux@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/r8188eu/os_dep/mlme_linux.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/r8188eu/os_dep/mlme_linux.c b/drivers/staging/r8188eu/os_dep/mlme_linux.c
index a9b6ffdbf31a..f7ce724ebf87 100644
--- a/drivers/staging/r8188eu/os_dep/mlme_linux.c
+++ b/drivers/staging/r8188eu/os_dep/mlme_linux.c
@@ -112,7 +112,7 @@ void rtw_report_sec_ie(struct adapter *adapter, u8 authmode, u8 *sec_ie)
buff = NULL;
if (authmode == _WPA_IE_ID_) {
- buff = kzalloc(IW_CUSTOM_MAX, GFP_KERNEL);
+ buff = kzalloc(IW_CUSTOM_MAX, GFP_ATOMIC);
if (!buff)
return;
p = buff;
--
2.33.1
This is a note to let you know that I've just added the patch titled
staging: r8188eu: fix a memory leak in rtw_wx_read32()
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From be4ea8f383551b9dae11b8dfff1f38b3b5436e9a Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)oracle.com>
Date: Tue, 9 Nov 2021 14:49:36 +0300
Subject: staging: r8188eu: fix a memory leak in rtw_wx_read32()
Free "ptmp" before returning -EINVAL.
Fixes: 2b42bd58b321 ("staging: r8188eu: introduce new os_dep dir for RTL8188eu driver")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Link: https://lore.kernel.org/r/20211109114935.GC16587@kili
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/r8188eu/os_dep/ioctl_linux.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/staging/r8188eu/os_dep/ioctl_linux.c b/drivers/staging/r8188eu/os_dep/ioctl_linux.c
index 52d42e576443..9404355726d0 100644
--- a/drivers/staging/r8188eu/os_dep/ioctl_linux.c
+++ b/drivers/staging/r8188eu/os_dep/ioctl_linux.c
@@ -1980,6 +1980,7 @@ static int rtw_wx_read32(struct net_device *dev,
u32 data32;
u32 bytes;
u8 *ptmp;
+ int ret;
padapter = (struct adapter *)rtw_netdev_priv(dev);
p = &wrqu->data;
@@ -2007,12 +2008,17 @@ static int rtw_wx_read32(struct net_device *dev,
break;
default:
DBG_88E(KERN_INFO "%s: usage> read [bytes],[address(hex)]\n", __func__);
- return -EINVAL;
+ ret = -EINVAL;
+ goto err_free_ptmp;
}
DBG_88E(KERN_INFO "%s: addr = 0x%08X data =%s\n", __func__, addr, extra);
kfree(ptmp);
return 0;
+
+err_free_ptmp:
+ kfree(ptmp);
+ return ret;
}
static int rtw_wx_write32(struct net_device *dev,
--
2.33.1
This is a note to let you know that I've just added the patch titled
staging/fbtft: Fix backlight
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 7865dd24934ad580d1bcde8f63c39f324211a23b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Noralf=20Tr=C3=B8nnes?= <noralf(a)tronnes.org>
Date: Fri, 5 Nov 2021 21:43:58 +0100
Subject: staging/fbtft: Fix backlight
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit b4a1ed0cd18b ("fbdev: make FB_BACKLIGHT a tristate") forgot to
update fbtft breaking its backlight support when FB_BACKLIGHT is a module.
Since FB_TFT selects FB_BACKLIGHT there's no need for this conditional
so just remove it and we're good.
Fixes: b4a1ed0cd18b ("fbdev: make FB_BACKLIGHT a tristate")
Cc: <stable(a)vger.kernel.org>
Acked-by: Sam Ravnborg <sam(a)ravnborg.org>
Signed-off-by: Noralf Trønnes <noralf(a)tronnes.org>
Link: https://lore.kernel.org/r/20211105204358.2991-1-noralf@tronnes.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/fbtft/fb_ssd1351.c | 4 ----
drivers/staging/fbtft/fbtft-core.c | 9 +--------
2 files changed, 1 insertion(+), 12 deletions(-)
diff --git a/drivers/staging/fbtft/fb_ssd1351.c b/drivers/staging/fbtft/fb_ssd1351.c
index cf263a58a148..6fd549a424d5 100644
--- a/drivers/staging/fbtft/fb_ssd1351.c
+++ b/drivers/staging/fbtft/fb_ssd1351.c
@@ -187,7 +187,6 @@ static struct fbtft_display display = {
},
};
-#ifdef CONFIG_FB_BACKLIGHT
static int update_onboard_backlight(struct backlight_device *bd)
{
struct fbtft_par *par = bl_get_data(bd);
@@ -231,9 +230,6 @@ static void register_onboard_backlight(struct fbtft_par *par)
if (!par->fbtftops.unregister_backlight)
par->fbtftops.unregister_backlight = fbtft_unregister_backlight;
}
-#else
-static void register_onboard_backlight(struct fbtft_par *par) { };
-#endif
FBTFT_REGISTER_DRIVER(DRVNAME, "solomon,ssd1351", &display);
diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
index ecb5f75f6dd5..f2684d2d6851 100644
--- a/drivers/staging/fbtft/fbtft-core.c
+++ b/drivers/staging/fbtft/fbtft-core.c
@@ -128,7 +128,6 @@ static int fbtft_request_gpios(struct fbtft_par *par)
return 0;
}
-#ifdef CONFIG_FB_BACKLIGHT
static int fbtft_backlight_update_status(struct backlight_device *bd)
{
struct fbtft_par *par = bl_get_data(bd);
@@ -161,6 +160,7 @@ void fbtft_unregister_backlight(struct fbtft_par *par)
par->info->bl_dev = NULL;
}
}
+EXPORT_SYMBOL(fbtft_unregister_backlight);
static const struct backlight_ops fbtft_bl_ops = {
.get_brightness = fbtft_backlight_get_brightness,
@@ -198,12 +198,7 @@ void fbtft_register_backlight(struct fbtft_par *par)
if (!par->fbtftops.unregister_backlight)
par->fbtftops.unregister_backlight = fbtft_unregister_backlight;
}
-#else
-void fbtft_register_backlight(struct fbtft_par *par) { };
-void fbtft_unregister_backlight(struct fbtft_par *par) { };
-#endif
EXPORT_SYMBOL(fbtft_register_backlight);
-EXPORT_SYMBOL(fbtft_unregister_backlight);
static void fbtft_set_addr_win(struct fbtft_par *par, int xs, int ys, int xe,
int ye)
@@ -853,13 +848,11 @@ int fbtft_register_framebuffer(struct fb_info *fb_info)
fb_info->fix.smem_len >> 10, text1,
HZ / fb_info->fbdefio->delay, text2);
-#ifdef CONFIG_FB_BACKLIGHT
/* Turn on backlight if available */
if (fb_info->bl_dev) {
fb_info->bl_dev->props.power = FB_BLANK_UNBLANK;
fb_info->bl_dev->ops->update_status(fb_info->bl_dev);
}
-#endif
return 0;
--
2.33.1
This is a note to let you know that I've just added the patch titled
staging: r8188eu: Fix breakage introduced when 5G code was removed
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From d5f0b804368951b6b4a77d2f14b5bb6a04b0e011 Mon Sep 17 00:00:00 2001
From: Larry Finger <Larry.Finger(a)lwfinger.net>
Date: Sun, 7 Nov 2021 11:35:43 -0600
Subject: staging: r8188eu: Fix breakage introduced when 5G code was removed
In commit 221abd4d478a ("staging: r8188eu: Remove no more necessary definitions
and code"), two entries were removed from RTW_ChannelPlanMap[], but not replaced
with zeros. The position within this table is important, thus the patch broke
systems operating in regulatory domains osted later than entry 0x13 in the table.
Unfortunately, the FCC entry comes before that point and most testers did not see
this problem.
Fixes: 221abd4d478a ("staging: r8188eu: Remove no more necessary definitions and code")
Cc: Stable <stable(a)vger.kernel.org> # v5.5+
Reported-and-tested-by: Zameer Manji <zmanji(a)gmail.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Reviewed-by: Phillip Potter <phil(a)philpotter.co.uk>
Signed-off-by: Larry Finger <Larry.Finger(a)lwfinger.net>
Link: https://lore.kernel.org/r/20211107173543.7486-1-Larry.Finger@lwfinger.net
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/r8188eu/core/rtw_mlme_ext.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/staging/r8188eu/core/rtw_mlme_ext.c b/drivers/staging/r8188eu/core/rtw_mlme_ext.c
index 55c3d4a6faeb..5b60e6df5f87 100644
--- a/drivers/staging/r8188eu/core/rtw_mlme_ext.c
+++ b/drivers/staging/r8188eu/core/rtw_mlme_ext.c
@@ -107,6 +107,7 @@ static struct rt_channel_plan_map RTW_ChannelPlanMap[RT_CHANNEL_DOMAIN_MAX] = {
{0x01}, /* 0x10, RT_CHANNEL_DOMAIN_JAPAN */
{0x02}, /* 0x11, RT_CHANNEL_DOMAIN_FCC_NO_DFS */
{0x01}, /* 0x12, RT_CHANNEL_DOMAIN_JAPAN_NO_DFS */
+ {0x00}, /* 0x13 */
{0x02}, /* 0x14, RT_CHANNEL_DOMAIN_TAIWAN_NO_DFS */
{0x00}, /* 0x15, RT_CHANNEL_DOMAIN_ETSI_NO_DFS */
{0x00}, /* 0x16, RT_CHANNEL_DOMAIN_KOREA_NO_DFS */
@@ -118,6 +119,7 @@ static struct rt_channel_plan_map RTW_ChannelPlanMap[RT_CHANNEL_DOMAIN_MAX] = {
{0x00}, /* 0x1C, */
{0x00}, /* 0x1D, */
{0x00}, /* 0x1E, */
+ {0x00}, /* 0x1F, */
/* 0x20 ~ 0x7F , New Define ===== */
{0x00}, /* 0x20, RT_CHANNEL_DOMAIN_WORLD_NULL */
{0x01}, /* 0x21, RT_CHANNEL_DOMAIN_ETSI1_NULL */
--
2.33.1
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: v4l2-ioctl.c: readbuffers depends on V4L2_CAP_READWRITE
Author: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Date: Wed Nov 3 12:28:31 2021 +0000
If V4L2_CAP_READWRITE is not set, then readbuffers must be set to 0,
otherwise v4l2-compliance will complain.
A note on the Fixes tag below: this patch does not really fix that commit,
but it can be applied from that commit onwards. For older code there is no
guarantee that device_caps is set, so even though this patch would apply,
it will not work reliably.
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Fixes: 049e684f2de9 (media: v4l2-dev: fix WARN_ON(!vdev->device_caps))
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/v4l2-core/v4l2-ioctl.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
---
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
index 31d0109ce5a8..69b74d0e8a90 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -2090,6 +2090,7 @@ static int v4l_prepare_buf(const struct v4l2_ioctl_ops *ops,
static int v4l_g_parm(const struct v4l2_ioctl_ops *ops,
struct file *file, void *fh, void *arg)
{
+ struct video_device *vfd = video_devdata(file);
struct v4l2_streamparm *p = arg;
v4l2_std_id std;
int ret = check_fmt(file, p->type);
@@ -2101,7 +2102,8 @@ static int v4l_g_parm(const struct v4l2_ioctl_ops *ops,
if (p->type != V4L2_BUF_TYPE_VIDEO_CAPTURE &&
p->type != V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE)
return -EINVAL;
- p->parm.capture.readbuffers = 2;
+ if (vfd->device_caps & V4L2_CAP_READWRITE)
+ p->parm.capture.readbuffers = 2;
ret = ops->vidioc_g_std(file, fh, &std);
if (ret == 0)
v4l2_video_std_frame_period(std, &p->parm.capture.timeperframe);
Hello, stable maintainers.
I'd like to request the change made by
cca2aac8acf470b01066f559acd7146fc4c32ae8 be applied to the 5.15 and
5.14 series so as to address the build failure. That patch doesn't
apply to 5.14 as-is so I am attaching a backported version of the patch
that does.
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0cf48167b87e388fa1268c9fe6d2443ae7f43d8a Mon Sep 17 00:00:00 2001
From: Sebastian Krzyszkowiak <sebastian.krzyszkowiak(a)puri.sm>
Date: Tue, 14 Sep 2021 14:18:05 +0200
Subject: [PATCH] power: supply: max17042_battery: Clear status bits in
interrupt handler
The gauge requires us to clear the status bits manually for some alerts
to be properly dismissed. Previously the IRQ was configured to react only
on falling edge, which wasn't technically correct (the ALRT line is active
low), but it had a happy side-effect of preventing interrupt storms
on uncleared alerts from happening.
Fixes: 7fbf6b731bca ("power: supply: max17042: Do not enforce (incorrect) interrupt trigger type")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Sebastian Krzyszkowiak <sebastian.krzyszkowiak(a)puri.sm>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel(a)collabora.com>
diff --git a/drivers/power/supply/max17042_battery.c b/drivers/power/supply/max17042_battery.c
index 32f331480487..2eb61856f3e4 100644
--- a/drivers/power/supply/max17042_battery.c
+++ b/drivers/power/supply/max17042_battery.c
@@ -879,6 +879,10 @@ static irqreturn_t max17042_thread_handler(int id, void *dev)
max17042_set_soc_threshold(chip, 1);
}
+ /* we implicitly handle all alerts via power_supply_changed */
+ regmap_clear_bits(chip->regmap, MAX17042_STATUS,
+ 0xFFFF & ~(STATUS_POR_BIT | STATUS_BST_BIT));
+
power_supply_changed(chip->battery);
return IRQ_HANDLED;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0cf48167b87e388fa1268c9fe6d2443ae7f43d8a Mon Sep 17 00:00:00 2001
From: Sebastian Krzyszkowiak <sebastian.krzyszkowiak(a)puri.sm>
Date: Tue, 14 Sep 2021 14:18:05 +0200
Subject: [PATCH] power: supply: max17042_battery: Clear status bits in
interrupt handler
The gauge requires us to clear the status bits manually for some alerts
to be properly dismissed. Previously the IRQ was configured to react only
on falling edge, which wasn't technically correct (the ALRT line is active
low), but it had a happy side-effect of preventing interrupt storms
on uncleared alerts from happening.
Fixes: 7fbf6b731bca ("power: supply: max17042: Do not enforce (incorrect) interrupt trigger type")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Sebastian Krzyszkowiak <sebastian.krzyszkowiak(a)puri.sm>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel(a)collabora.com>
diff --git a/drivers/power/supply/max17042_battery.c b/drivers/power/supply/max17042_battery.c
index 32f331480487..2eb61856f3e4 100644
--- a/drivers/power/supply/max17042_battery.c
+++ b/drivers/power/supply/max17042_battery.c
@@ -879,6 +879,10 @@ static irqreturn_t max17042_thread_handler(int id, void *dev)
max17042_set_soc_threshold(chip, 1);
}
+ /* we implicitly handle all alerts via power_supply_changed */
+ regmap_clear_bits(chip->regmap, MAX17042_STATUS,
+ 0xFFFF & ~(STATUS_POR_BIT | STATUS_BST_BIT));
+
power_supply_changed(chip->battery);
return IRQ_HANDLED;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0cf48167b87e388fa1268c9fe6d2443ae7f43d8a Mon Sep 17 00:00:00 2001
From: Sebastian Krzyszkowiak <sebastian.krzyszkowiak(a)puri.sm>
Date: Tue, 14 Sep 2021 14:18:05 +0200
Subject: [PATCH] power: supply: max17042_battery: Clear status bits in
interrupt handler
The gauge requires us to clear the status bits manually for some alerts
to be properly dismissed. Previously the IRQ was configured to react only
on falling edge, which wasn't technically correct (the ALRT line is active
low), but it had a happy side-effect of preventing interrupt storms
on uncleared alerts from happening.
Fixes: 7fbf6b731bca ("power: supply: max17042: Do not enforce (incorrect) interrupt trigger type")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Sebastian Krzyszkowiak <sebastian.krzyszkowiak(a)puri.sm>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
Signed-off-by: Sebastian Reichel <sebastian.reichel(a)collabora.com>
diff --git a/drivers/power/supply/max17042_battery.c b/drivers/power/supply/max17042_battery.c
index 32f331480487..2eb61856f3e4 100644
--- a/drivers/power/supply/max17042_battery.c
+++ b/drivers/power/supply/max17042_battery.c
@@ -879,6 +879,10 @@ static irqreturn_t max17042_thread_handler(int id, void *dev)
max17042_set_soc_threshold(chip, 1);
}
+ /* we implicitly handle all alerts via power_supply_changed */
+ regmap_clear_bits(chip->regmap, MAX17042_STATUS,
+ 0xFFFF & ~(STATUS_POR_BIT | STATUS_BST_BIT));
+
power_supply_changed(chip->battery);
return IRQ_HANDLED;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c87761db2100677a69be551365105125d872af5b Mon Sep 17 00:00:00 2001
From: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
Date: Wed, 13 Oct 2021 19:13:45 +0300
Subject: [PATCH] component: do not leave master devres group open after bind
In current code, the devres group for aggregate master is left open
after call to component_master_add_*(). This leads to problems when the
master does further managed allocations on its own. When any
participating driver calls component_del(), this leads to immediate
release of resources.
This came up when investigating a page fault occurring with i915 DRM
driver unbind with 5.15-rc1 kernel. The following sequence occurs:
i915_pci_remove()
-> intel_display_driver_unregister()
-> i915_audio_component_cleanup()
-> component_del()
-> component.c:take_down_master()
-> hdac_component_master_unbind() [via master->ops->unbind()]
-> devres_release_group(master->parent, NULL)
With older kernels this has not caused issues, but with audio driver
moving to use managed interfaces for more of its allocations, this no
longer works. Devres log shows following to occur:
component_master_add_with_match()
[ 126.886032] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000323ccdc5 devm_component_match_release (24 bytes)
[ 126.886045] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000865cdb29 grp< (0 bytes)
[ 126.886049] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 grp< (0 bytes)
audio driver completes its PCI probe()
[ 126.892238] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 pcim_iomap_release (48 bytes)
component_del() called() at DRM/i915 unbind()
[ 137.579422] i915 0000:00:02.0: DEVRES REL 00000000ef44c293 grp< (0 bytes)
[ 137.579445] snd_hda_intel 0000:00:1f.3: DEVRES REL 00000000865cdb29 grp< (0 bytes)
[ 137.579458] snd_hda_intel 0000:00:1f.3: DEVRES REL 000000001b480725 pcim_iomap_release (48 bytes)
So the "devres_release_group(master->parent, NULL)" ends up freeing the
pcim_iomap allocation. Upon next runtime resume, the audio driver will
cause a page fault as the iomap alloc was released without the driver
knowing about it.
Fix this issue by using the "struct master" pointer as identifier for
the devres group, and by closing the devres group after
the master->ops->bind() call is done. This allows devres allocations
done by the driver acting as master to be isolated from the binding state
of the aggregate driver. This modifies the logic originally introduced in
commit 9e1ccb4a7700 ("drivers/base: fix devres handling for master device")
Fixes: 9e1ccb4a7700 ("drivers/base: fix devres handling for master device")
Cc: stable(a)vger.kernel.org
Acked-by: Imre Deak <imre.deak(a)intel.com>
Acked-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136
Link: https://lore.kernel.org/r/20211013161345.3755341-1-kai.vehmanen@linux.intel…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/base/component.c b/drivers/base/component.c
index 6dc309913ccd..2d25a6416587 100644
--- a/drivers/base/component.c
+++ b/drivers/base/component.c
@@ -245,7 +245,7 @@ static int try_to_bring_up_master(struct master *master,
return 0;
}
- if (!devres_open_group(master->parent, NULL, GFP_KERNEL))
+ if (!devres_open_group(master->parent, master, GFP_KERNEL))
return -ENOMEM;
/* Found all components */
@@ -257,6 +257,7 @@ static int try_to_bring_up_master(struct master *master,
return ret;
}
+ devres_close_group(master->parent, NULL);
master->bound = true;
return 1;
}
@@ -281,7 +282,7 @@ static void take_down_master(struct master *master)
{
if (master->bound) {
master->ops->unbind(master->parent);
- devres_release_group(master->parent, NULL);
+ devres_release_group(master->parent, master);
master->bound = false;
}
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c87761db2100677a69be551365105125d872af5b Mon Sep 17 00:00:00 2001
From: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
Date: Wed, 13 Oct 2021 19:13:45 +0300
Subject: [PATCH] component: do not leave master devres group open after bind
In current code, the devres group for aggregate master is left open
after call to component_master_add_*(). This leads to problems when the
master does further managed allocations on its own. When any
participating driver calls component_del(), this leads to immediate
release of resources.
This came up when investigating a page fault occurring with i915 DRM
driver unbind with 5.15-rc1 kernel. The following sequence occurs:
i915_pci_remove()
-> intel_display_driver_unregister()
-> i915_audio_component_cleanup()
-> component_del()
-> component.c:take_down_master()
-> hdac_component_master_unbind() [via master->ops->unbind()]
-> devres_release_group(master->parent, NULL)
With older kernels this has not caused issues, but with audio driver
moving to use managed interfaces for more of its allocations, this no
longer works. Devres log shows following to occur:
component_master_add_with_match()
[ 126.886032] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000323ccdc5 devm_component_match_release (24 bytes)
[ 126.886045] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000865cdb29 grp< (0 bytes)
[ 126.886049] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 grp< (0 bytes)
audio driver completes its PCI probe()
[ 126.892238] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 pcim_iomap_release (48 bytes)
component_del() called() at DRM/i915 unbind()
[ 137.579422] i915 0000:00:02.0: DEVRES REL 00000000ef44c293 grp< (0 bytes)
[ 137.579445] snd_hda_intel 0000:00:1f.3: DEVRES REL 00000000865cdb29 grp< (0 bytes)
[ 137.579458] snd_hda_intel 0000:00:1f.3: DEVRES REL 000000001b480725 pcim_iomap_release (48 bytes)
So the "devres_release_group(master->parent, NULL)" ends up freeing the
pcim_iomap allocation. Upon next runtime resume, the audio driver will
cause a page fault as the iomap alloc was released without the driver
knowing about it.
Fix this issue by using the "struct master" pointer as identifier for
the devres group, and by closing the devres group after
the master->ops->bind() call is done. This allows devres allocations
done by the driver acting as master to be isolated from the binding state
of the aggregate driver. This modifies the logic originally introduced in
commit 9e1ccb4a7700 ("drivers/base: fix devres handling for master device")
Fixes: 9e1ccb4a7700 ("drivers/base: fix devres handling for master device")
Cc: stable(a)vger.kernel.org
Acked-by: Imre Deak <imre.deak(a)intel.com>
Acked-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136
Link: https://lore.kernel.org/r/20211013161345.3755341-1-kai.vehmanen@linux.intel…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/base/component.c b/drivers/base/component.c
index 6dc309913ccd..2d25a6416587 100644
--- a/drivers/base/component.c
+++ b/drivers/base/component.c
@@ -245,7 +245,7 @@ static int try_to_bring_up_master(struct master *master,
return 0;
}
- if (!devres_open_group(master->parent, NULL, GFP_KERNEL))
+ if (!devres_open_group(master->parent, master, GFP_KERNEL))
return -ENOMEM;
/* Found all components */
@@ -257,6 +257,7 @@ static int try_to_bring_up_master(struct master *master,
return ret;
}
+ devres_close_group(master->parent, NULL);
master->bound = true;
return 1;
}
@@ -281,7 +282,7 @@ static void take_down_master(struct master *master)
{
if (master->bound) {
master->ops->unbind(master->parent);
- devres_release_group(master->parent, NULL);
+ devres_release_group(master->parent, master);
master->bound = false;
}
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c87761db2100677a69be551365105125d872af5b Mon Sep 17 00:00:00 2001
From: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
Date: Wed, 13 Oct 2021 19:13:45 +0300
Subject: [PATCH] component: do not leave master devres group open after bind
In current code, the devres group for aggregate master is left open
after call to component_master_add_*(). This leads to problems when the
master does further managed allocations on its own. When any
participating driver calls component_del(), this leads to immediate
release of resources.
This came up when investigating a page fault occurring with i915 DRM
driver unbind with 5.15-rc1 kernel. The following sequence occurs:
i915_pci_remove()
-> intel_display_driver_unregister()
-> i915_audio_component_cleanup()
-> component_del()
-> component.c:take_down_master()
-> hdac_component_master_unbind() [via master->ops->unbind()]
-> devres_release_group(master->parent, NULL)
With older kernels this has not caused issues, but with audio driver
moving to use managed interfaces for more of its allocations, this no
longer works. Devres log shows following to occur:
component_master_add_with_match()
[ 126.886032] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000323ccdc5 devm_component_match_release (24 bytes)
[ 126.886045] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000865cdb29 grp< (0 bytes)
[ 126.886049] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 grp< (0 bytes)
audio driver completes its PCI probe()
[ 126.892238] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 pcim_iomap_release (48 bytes)
component_del() called() at DRM/i915 unbind()
[ 137.579422] i915 0000:00:02.0: DEVRES REL 00000000ef44c293 grp< (0 bytes)
[ 137.579445] snd_hda_intel 0000:00:1f.3: DEVRES REL 00000000865cdb29 grp< (0 bytes)
[ 137.579458] snd_hda_intel 0000:00:1f.3: DEVRES REL 000000001b480725 pcim_iomap_release (48 bytes)
So the "devres_release_group(master->parent, NULL)" ends up freeing the
pcim_iomap allocation. Upon next runtime resume, the audio driver will
cause a page fault as the iomap alloc was released without the driver
knowing about it.
Fix this issue by using the "struct master" pointer as identifier for
the devres group, and by closing the devres group after
the master->ops->bind() call is done. This allows devres allocations
done by the driver acting as master to be isolated from the binding state
of the aggregate driver. This modifies the logic originally introduced in
commit 9e1ccb4a7700 ("drivers/base: fix devres handling for master device")
Fixes: 9e1ccb4a7700 ("drivers/base: fix devres handling for master device")
Cc: stable(a)vger.kernel.org
Acked-by: Imre Deak <imre.deak(a)intel.com>
Acked-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136
Link: https://lore.kernel.org/r/20211013161345.3755341-1-kai.vehmanen@linux.intel…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/base/component.c b/drivers/base/component.c
index 6dc309913ccd..2d25a6416587 100644
--- a/drivers/base/component.c
+++ b/drivers/base/component.c
@@ -245,7 +245,7 @@ static int try_to_bring_up_master(struct master *master,
return 0;
}
- if (!devres_open_group(master->parent, NULL, GFP_KERNEL))
+ if (!devres_open_group(master->parent, master, GFP_KERNEL))
return -ENOMEM;
/* Found all components */
@@ -257,6 +257,7 @@ static int try_to_bring_up_master(struct master *master,
return ret;
}
+ devres_close_group(master->parent, NULL);
master->bound = true;
return 1;
}
@@ -281,7 +282,7 @@ static void take_down_master(struct master *master)
{
if (master->bound) {
master->ops->unbind(master->parent);
- devres_release_group(master->parent, NULL);
+ devres_release_group(master->parent, master);
master->bound = false;
}
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c87761db2100677a69be551365105125d872af5b Mon Sep 17 00:00:00 2001
From: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
Date: Wed, 13 Oct 2021 19:13:45 +0300
Subject: [PATCH] component: do not leave master devres group open after bind
In current code, the devres group for aggregate master is left open
after call to component_master_add_*(). This leads to problems when the
master does further managed allocations on its own. When any
participating driver calls component_del(), this leads to immediate
release of resources.
This came up when investigating a page fault occurring with i915 DRM
driver unbind with 5.15-rc1 kernel. The following sequence occurs:
i915_pci_remove()
-> intel_display_driver_unregister()
-> i915_audio_component_cleanup()
-> component_del()
-> component.c:take_down_master()
-> hdac_component_master_unbind() [via master->ops->unbind()]
-> devres_release_group(master->parent, NULL)
With older kernels this has not caused issues, but with audio driver
moving to use managed interfaces for more of its allocations, this no
longer works. Devres log shows following to occur:
component_master_add_with_match()
[ 126.886032] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000323ccdc5 devm_component_match_release (24 bytes)
[ 126.886045] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000865cdb29 grp< (0 bytes)
[ 126.886049] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 grp< (0 bytes)
audio driver completes its PCI probe()
[ 126.892238] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 pcim_iomap_release (48 bytes)
component_del() called() at DRM/i915 unbind()
[ 137.579422] i915 0000:00:02.0: DEVRES REL 00000000ef44c293 grp< (0 bytes)
[ 137.579445] snd_hda_intel 0000:00:1f.3: DEVRES REL 00000000865cdb29 grp< (0 bytes)
[ 137.579458] snd_hda_intel 0000:00:1f.3: DEVRES REL 000000001b480725 pcim_iomap_release (48 bytes)
So the "devres_release_group(master->parent, NULL)" ends up freeing the
pcim_iomap allocation. Upon next runtime resume, the audio driver will
cause a page fault as the iomap alloc was released without the driver
knowing about it.
Fix this issue by using the "struct master" pointer as identifier for
the devres group, and by closing the devres group after
the master->ops->bind() call is done. This allows devres allocations
done by the driver acting as master to be isolated from the binding state
of the aggregate driver. This modifies the logic originally introduced in
commit 9e1ccb4a7700 ("drivers/base: fix devres handling for master device")
Fixes: 9e1ccb4a7700 ("drivers/base: fix devres handling for master device")
Cc: stable(a)vger.kernel.org
Acked-by: Imre Deak <imre.deak(a)intel.com>
Acked-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136
Link: https://lore.kernel.org/r/20211013161345.3755341-1-kai.vehmanen@linux.intel…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/base/component.c b/drivers/base/component.c
index 6dc309913ccd..2d25a6416587 100644
--- a/drivers/base/component.c
+++ b/drivers/base/component.c
@@ -245,7 +245,7 @@ static int try_to_bring_up_master(struct master *master,
return 0;
}
- if (!devres_open_group(master->parent, NULL, GFP_KERNEL))
+ if (!devres_open_group(master->parent, master, GFP_KERNEL))
return -ENOMEM;
/* Found all components */
@@ -257,6 +257,7 @@ static int try_to_bring_up_master(struct master *master,
return ret;
}
+ devres_close_group(master->parent, NULL);
master->bound = true;
return 1;
}
@@ -281,7 +282,7 @@ static void take_down_master(struct master *master)
{
if (master->bound) {
master->ops->unbind(master->parent);
- devres_release_group(master->parent, NULL);
+ devres_release_group(master->parent, master);
master->bound = false;
}
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c87761db2100677a69be551365105125d872af5b Mon Sep 17 00:00:00 2001
From: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
Date: Wed, 13 Oct 2021 19:13:45 +0300
Subject: [PATCH] component: do not leave master devres group open after bind
In current code, the devres group for aggregate master is left open
after call to component_master_add_*(). This leads to problems when the
master does further managed allocations on its own. When any
participating driver calls component_del(), this leads to immediate
release of resources.
This came up when investigating a page fault occurring with i915 DRM
driver unbind with 5.15-rc1 kernel. The following sequence occurs:
i915_pci_remove()
-> intel_display_driver_unregister()
-> i915_audio_component_cleanup()
-> component_del()
-> component.c:take_down_master()
-> hdac_component_master_unbind() [via master->ops->unbind()]
-> devres_release_group(master->parent, NULL)
With older kernels this has not caused issues, but with audio driver
moving to use managed interfaces for more of its allocations, this no
longer works. Devres log shows following to occur:
component_master_add_with_match()
[ 126.886032] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000323ccdc5 devm_component_match_release (24 bytes)
[ 126.886045] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000865cdb29 grp< (0 bytes)
[ 126.886049] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 grp< (0 bytes)
audio driver completes its PCI probe()
[ 126.892238] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 pcim_iomap_release (48 bytes)
component_del() called() at DRM/i915 unbind()
[ 137.579422] i915 0000:00:02.0: DEVRES REL 00000000ef44c293 grp< (0 bytes)
[ 137.579445] snd_hda_intel 0000:00:1f.3: DEVRES REL 00000000865cdb29 grp< (0 bytes)
[ 137.579458] snd_hda_intel 0000:00:1f.3: DEVRES REL 000000001b480725 pcim_iomap_release (48 bytes)
So the "devres_release_group(master->parent, NULL)" ends up freeing the
pcim_iomap allocation. Upon next runtime resume, the audio driver will
cause a page fault as the iomap alloc was released without the driver
knowing about it.
Fix this issue by using the "struct master" pointer as identifier for
the devres group, and by closing the devres group after
the master->ops->bind() call is done. This allows devres allocations
done by the driver acting as master to be isolated from the binding state
of the aggregate driver. This modifies the logic originally introduced in
commit 9e1ccb4a7700 ("drivers/base: fix devres handling for master device")
Fixes: 9e1ccb4a7700 ("drivers/base: fix devres handling for master device")
Cc: stable(a)vger.kernel.org
Acked-by: Imre Deak <imre.deak(a)intel.com>
Acked-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136
Link: https://lore.kernel.org/r/20211013161345.3755341-1-kai.vehmanen@linux.intel…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/base/component.c b/drivers/base/component.c
index 6dc309913ccd..2d25a6416587 100644
--- a/drivers/base/component.c
+++ b/drivers/base/component.c
@@ -245,7 +245,7 @@ static int try_to_bring_up_master(struct master *master,
return 0;
}
- if (!devres_open_group(master->parent, NULL, GFP_KERNEL))
+ if (!devres_open_group(master->parent, master, GFP_KERNEL))
return -ENOMEM;
/* Found all components */
@@ -257,6 +257,7 @@ static int try_to_bring_up_master(struct master *master,
return ret;
}
+ devres_close_group(master->parent, NULL);
master->bound = true;
return 1;
}
@@ -281,7 +282,7 @@ static void take_down_master(struct master *master)
{
if (master->bound) {
master->ops->unbind(master->parent);
- devres_release_group(master->parent, NULL);
+ devres_release_group(master->parent, master);
master->bound = false;
}
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c87761db2100677a69be551365105125d872af5b Mon Sep 17 00:00:00 2001
From: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
Date: Wed, 13 Oct 2021 19:13:45 +0300
Subject: [PATCH] component: do not leave master devres group open after bind
In current code, the devres group for aggregate master is left open
after call to component_master_add_*(). This leads to problems when the
master does further managed allocations on its own. When any
participating driver calls component_del(), this leads to immediate
release of resources.
This came up when investigating a page fault occurring with i915 DRM
driver unbind with 5.15-rc1 kernel. The following sequence occurs:
i915_pci_remove()
-> intel_display_driver_unregister()
-> i915_audio_component_cleanup()
-> component_del()
-> component.c:take_down_master()
-> hdac_component_master_unbind() [via master->ops->unbind()]
-> devres_release_group(master->parent, NULL)
With older kernels this has not caused issues, but with audio driver
moving to use managed interfaces for more of its allocations, this no
longer works. Devres log shows following to occur:
component_master_add_with_match()
[ 126.886032] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000323ccdc5 devm_component_match_release (24 bytes)
[ 126.886045] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000865cdb29 grp< (0 bytes)
[ 126.886049] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 grp< (0 bytes)
audio driver completes its PCI probe()
[ 126.892238] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 pcim_iomap_release (48 bytes)
component_del() called() at DRM/i915 unbind()
[ 137.579422] i915 0000:00:02.0: DEVRES REL 00000000ef44c293 grp< (0 bytes)
[ 137.579445] snd_hda_intel 0000:00:1f.3: DEVRES REL 00000000865cdb29 grp< (0 bytes)
[ 137.579458] snd_hda_intel 0000:00:1f.3: DEVRES REL 000000001b480725 pcim_iomap_release (48 bytes)
So the "devres_release_group(master->parent, NULL)" ends up freeing the
pcim_iomap allocation. Upon next runtime resume, the audio driver will
cause a page fault as the iomap alloc was released without the driver
knowing about it.
Fix this issue by using the "struct master" pointer as identifier for
the devres group, and by closing the devres group after
the master->ops->bind() call is done. This allows devres allocations
done by the driver acting as master to be isolated from the binding state
of the aggregate driver. This modifies the logic originally introduced in
commit 9e1ccb4a7700 ("drivers/base: fix devres handling for master device")
Fixes: 9e1ccb4a7700 ("drivers/base: fix devres handling for master device")
Cc: stable(a)vger.kernel.org
Acked-by: Imre Deak <imre.deak(a)intel.com>
Acked-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136
Link: https://lore.kernel.org/r/20211013161345.3755341-1-kai.vehmanen@linux.intel…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/base/component.c b/drivers/base/component.c
index 6dc309913ccd..2d25a6416587 100644
--- a/drivers/base/component.c
+++ b/drivers/base/component.c
@@ -245,7 +245,7 @@ static int try_to_bring_up_master(struct master *master,
return 0;
}
- if (!devres_open_group(master->parent, NULL, GFP_KERNEL))
+ if (!devres_open_group(master->parent, master, GFP_KERNEL))
return -ENOMEM;
/* Found all components */
@@ -257,6 +257,7 @@ static int try_to_bring_up_master(struct master *master,
return ret;
}
+ devres_close_group(master->parent, NULL);
master->bound = true;
return 1;
}
@@ -281,7 +282,7 @@ static void take_down_master(struct master *master)
{
if (master->bound) {
master->ops->unbind(master->parent);
- devres_release_group(master->parent, NULL);
+ devres_release_group(master->parent, master);
master->bound = false;
}
}
From: Charan Teja Reddy <charante(a)codeaurora.org>
[ Upstream commit f492283b157053e9555787262f058ae33096f568 ]
It is expected from the clients to follow the below steps on an imported
dmabuf fd:
a) dmabuf = dma_buf_get(fd) // Get the dmabuf from fd
b) dma_buf_attach(dmabuf); // Clients attach to the dmabuf
o Here the kernel does some slab allocations, say for
dma_buf_attachment and may be some other slab allocation in the
dmabuf->ops->attach().
c) Client may need to do dma_buf_map_attachment().
d) Accordingly dma_buf_unmap_attachment() should be called.
e) dma_buf_detach () // Clients detach to the dmabuf.
o Here the slab allocations made in b) are freed.
f) dma_buf_put(dmabuf) // Can free the dmabuf if it is the last
reference.
Now say an erroneous client failed at step c) above thus it directly
called dma_buf_put(), step f) above. Considering that it may be the last
reference to the dmabuf, buffer will be freed with pending attachments
left to the dmabuf which can show up as the 'memory leak'. This should
at least be reported as the WARN().
Signed-off-by: Charan Teja Reddy <charante(a)codeaurora.org>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1627043468-16381-1-git-send-e…
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/dma-buf/dma-buf.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 63d32261b63ff..474de2d988ca7 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -82,6 +82,7 @@ static void dma_buf_release(struct dentry *dentry)
if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
dma_resv_fini(dmabuf->resv);
+ WARN_ON(!list_empty(&dmabuf->attachments));
module_put(dmabuf->owner);
kfree(dmabuf->name);
kfree(dmabuf);
--
2.33.0
From: Charan Teja Reddy <charante(a)codeaurora.org>
[ Upstream commit f492283b157053e9555787262f058ae33096f568 ]
It is expected from the clients to follow the below steps on an imported
dmabuf fd:
a) dmabuf = dma_buf_get(fd) // Get the dmabuf from fd
b) dma_buf_attach(dmabuf); // Clients attach to the dmabuf
o Here the kernel does some slab allocations, say for
dma_buf_attachment and may be some other slab allocation in the
dmabuf->ops->attach().
c) Client may need to do dma_buf_map_attachment().
d) Accordingly dma_buf_unmap_attachment() should be called.
e) dma_buf_detach () // Clients detach to the dmabuf.
o Here the slab allocations made in b) are freed.
f) dma_buf_put(dmabuf) // Can free the dmabuf if it is the last
reference.
Now say an erroneous client failed at step c) above thus it directly
called dma_buf_put(), step f) above. Considering that it may be the last
reference to the dmabuf, buffer will be freed with pending attachments
left to the dmabuf which can show up as the 'memory leak'. This should
at least be reported as the WARN().
Signed-off-by: Charan Teja Reddy <charante(a)codeaurora.org>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1627043468-16381-1-git-send-e…
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/dma-buf/dma-buf.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 922416b3aaceb..93e9bf7382595 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -79,6 +79,7 @@ static void dma_buf_release(struct dentry *dentry)
if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
dma_resv_fini(dmabuf->resv);
+ WARN_ON(!list_empty(&dmabuf->attachments));
module_put(dmabuf->owner);
kfree(dmabuf->name);
kfree(dmabuf);
--
2.33.0
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a4e17d65dafdd3513042d8f00404c9b6068a825c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Tue, 5 Oct 2021 20:09:41 +0200
Subject: [PATCH] PCI: aardvark: Fix PCIe Max Payload Size setting
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Change PCIe Max Payload Size setting in PCIe Device Control register to 512
bytes to align with PCIe Link Initialization sequence as defined in Marvell
Armada 3700 Functional Specification. According to the specification,
maximal Max Payload Size supported by this device is 512 bytes.
Without this kernel prints suspicious line:
pci 0000:01:00.0: Upstream bridge's Max Payload Size set to 256 (was 16384, max 512)
With this change it changes to:
pci 0000:01:00.0: Upstream bridge's Max Payload Size set to 256 (was 512, max 512)
Link: https://lore.kernel.org/r/20211005180952.6812-3-kabel@kernel.org
Fixes: 8c39d710363c ("PCI: aardvark: Add Aardvark PCI host controller driver")
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Signed-off-by: Marek Behún <kabel(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Reviewed-by: Marek Behún <kabel(a)kernel.org>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index 596ebcfcc82d..884510630bae 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -488,8 +488,9 @@ static void advk_pcie_setup_hw(struct advk_pcie *pcie)
reg = advk_readl(pcie, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL);
reg &= ~PCI_EXP_DEVCTL_RELAX_EN;
reg &= ~PCI_EXP_DEVCTL_NOSNOOP_EN;
+ reg &= ~PCI_EXP_DEVCTL_PAYLOAD;
reg &= ~PCI_EXP_DEVCTL_READRQ;
- reg |= PCI_EXP_DEVCTL_PAYLOAD; /* Set max payload size */
+ reg |= PCI_EXP_DEVCTL_PAYLOAD_512B;
reg |= PCI_EXP_DEVCTL_READRQ_512B;
advk_writel(pcie, reg, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a4e17d65dafdd3513042d8f00404c9b6068a825c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Tue, 5 Oct 2021 20:09:41 +0200
Subject: [PATCH] PCI: aardvark: Fix PCIe Max Payload Size setting
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Change PCIe Max Payload Size setting in PCIe Device Control register to 512
bytes to align with PCIe Link Initialization sequence as defined in Marvell
Armada 3700 Functional Specification. According to the specification,
maximal Max Payload Size supported by this device is 512 bytes.
Without this kernel prints suspicious line:
pci 0000:01:00.0: Upstream bridge's Max Payload Size set to 256 (was 16384, max 512)
With this change it changes to:
pci 0000:01:00.0: Upstream bridge's Max Payload Size set to 256 (was 512, max 512)
Link: https://lore.kernel.org/r/20211005180952.6812-3-kabel@kernel.org
Fixes: 8c39d710363c ("PCI: aardvark: Add Aardvark PCI host controller driver")
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Signed-off-by: Marek Behún <kabel(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Reviewed-by: Marek Behún <kabel(a)kernel.org>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index 596ebcfcc82d..884510630bae 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -488,8 +488,9 @@ static void advk_pcie_setup_hw(struct advk_pcie *pcie)
reg = advk_readl(pcie, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL);
reg &= ~PCI_EXP_DEVCTL_RELAX_EN;
reg &= ~PCI_EXP_DEVCTL_NOSNOOP_EN;
+ reg &= ~PCI_EXP_DEVCTL_PAYLOAD;
reg &= ~PCI_EXP_DEVCTL_READRQ;
- reg |= PCI_EXP_DEVCTL_PAYLOAD; /* Set max payload size */
+ reg |= PCI_EXP_DEVCTL_PAYLOAD_512B;
reg |= PCI_EXP_DEVCTL_READRQ_512B;
advk_writel(pcie, reg, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ea401499e943c307e6d44af6c2b4e068643e7884 Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Sat, 31 Jul 2021 14:39:01 +0200
Subject: [PATCH] PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot
Reset
Stuart Hayes reports that an error handled by DPC at a Root Port results
in pciehp gratuitously bringing down a subordinate hotplug port:
RP -- UP -- DP -- UP -- DP (hotplug) -- EP
pciehp brings the slot down because the Link to the Endpoint goes down.
That is caused by a Hot Reset being propagated as a result of DPC.
Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":
For a Switch, the following must cause a hot reset to be sent on all
Downstream Ports: [...]
* The Data Link Layer of the Upstream Port reporting DL_Down status.
In Switches that support Link speeds greater than 5.0 GT/s, the
Upstream Port must direct the LTSSM of each Downstream Port to the
Hot Reset state, but not hold the LTSSMs in that state. This permits
each Downstream Port to begin Link training immediately after its
hot reset completes. This behavior is recommended for all Switches.
* Receiving a hot reset on the Upstream Port.
Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
invokes pcie_portdrv_slot_reset() to restore each port's config space.
At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
section 6.7.3.4 "Software Notification of Hot-Plug Events":
If the Port is enabled for edge-triggered interrupt signaling using
MSI or MSI-X, an interrupt message must be sent every time the logical
AND of the following conditions transitions from FALSE to TRUE: [...]
* The Hot-Plug Interrupt Enable bit in the Slot Control register is
set to 1b.
* At least one hot-plug event status bit in the Slot Status register
and its associated enable bit in the Slot Control register are both
set to 1b.
Prevent pciehp from gratuitously bringing down the slot by clearing the
error-induced Data Link Layer State Changed event before restoring
config space. Afterwards, check whether the link has unexpectedly
failed to retrain and synthesize a DLLSC event if so.
Allow each pcie_port_service_driver (one of them being pciehp) to define
a slot_reset callback and re-use the existing pm_iter() function to
iterate over the callbacks.
Thereby, the Endpoint driver remains bound throughout error recovery and
may restore the device to working state.
Surprise removal during error recovery is detected through a Presence
Detect Changed event. The hotplug port is expected to not signal that
event as a result of a Hot Reset.
The issue isn't DPC-specific, it also occurs when an error is handled by
AER through aer_root_reset(). So while the issue was noticed only now,
it's been around since 2006 when AER support was first introduced.
[bhelgaas: drop PCI_ERROR_RECOVERY Kconfig, split pm_iter() rename to
preparatory patch]
Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmai…
Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
Link: https://lore.kernel.org/r/251f4edcc04c14f873ff1c967bc686169cd07d2d.16276381…
Reported-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Tested-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: stable(a)vger.kernel.org # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel
Cc: Keith Busch <kbusch(a)kernel.org>
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 69fd401691be..918dccbc74b6 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -189,6 +189,8 @@ int pciehp_get_attention_status(struct hotplug_slot *hotplug_slot, u8 *status);
int pciehp_set_raw_indicator_status(struct hotplug_slot *h_slot, u8 status);
int pciehp_get_raw_indicator_status(struct hotplug_slot *h_slot, u8 *status);
+int pciehp_slot_reset(struct pcie_device *dev);
+
static inline const char *slot_name(struct controller *ctrl)
{
return hotplug_slot_name(&ctrl->hotplug_slot);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index ad3393930ecb..f34114d45259 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -351,6 +351,8 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
.runtime_suspend = pciehp_runtime_suspend,
.runtime_resume = pciehp_runtime_resume,
#endif /* PM */
+
+ .slot_reset = pciehp_slot_reset,
};
int __init pcie_hp_init(void)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3024d7e85e6a..83a0fa119cae 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -862,6 +862,32 @@ void pcie_disable_interrupt(struct controller *ctrl)
pcie_write_cmd(ctrl, 0, mask);
}
+/**
+ * pciehp_slot_reset() - ignore link event caused by error-induced hot reset
+ * @dev: PCI Express port service device
+ *
+ * Called from pcie_portdrv_slot_reset() after AER or DPC initiated a reset
+ * further up in the hierarchy to recover from an error. The reset was
+ * propagated down to this hotplug port. Ignore the resulting link flap.
+ * If the link failed to retrain successfully, synthesize the ignored event.
+ * Surprise removal during reset is detected through Presence Detect Changed.
+ */
+int pciehp_slot_reset(struct pcie_device *dev)
+{
+ struct controller *ctrl = get_service_data(dev);
+
+ if (ctrl->state != ON_STATE)
+ return 0;
+
+ pcie_capability_write_word(dev->port, PCI_EXP_SLTSTA,
+ PCI_EXP_SLTSTA_DLLSC);
+
+ if (!pciehp_check_link_active(ctrl))
+ pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
+
+ return 0;
+}
+
/*
* pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
* bus reset of the bridge, but at the same time we want to ensure that it is
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 6126ee4676a7..41fe1ffd5907 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -85,6 +85,8 @@ struct pcie_port_service_driver {
int (*runtime_suspend)(struct pcie_device *dev);
int (*runtime_resume)(struct pcie_device *dev);
+ int (*slot_reset)(struct pcie_device *dev);
+
/* Device driver may resume normal operations */
void (*error_resume)(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index c7ff1eea225a..1af74c3d9d5d 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -160,6 +160,9 @@ static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
{
+ size_t off = offsetof(struct pcie_port_service_driver, slot_reset);
+ device_for_each_child(&dev->dev, &off, pcie_port_device_iter);
+
pci_restore_state(dev);
pci_save_state(dev);
return PCI_ERS_RESULT_RECOVERED;
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ea401499e943c307e6d44af6c2b4e068643e7884 Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Sat, 31 Jul 2021 14:39:01 +0200
Subject: [PATCH] PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot
Reset
Stuart Hayes reports that an error handled by DPC at a Root Port results
in pciehp gratuitously bringing down a subordinate hotplug port:
RP -- UP -- DP -- UP -- DP (hotplug) -- EP
pciehp brings the slot down because the Link to the Endpoint goes down.
That is caused by a Hot Reset being propagated as a result of DPC.
Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":
For a Switch, the following must cause a hot reset to be sent on all
Downstream Ports: [...]
* The Data Link Layer of the Upstream Port reporting DL_Down status.
In Switches that support Link speeds greater than 5.0 GT/s, the
Upstream Port must direct the LTSSM of each Downstream Port to the
Hot Reset state, but not hold the LTSSMs in that state. This permits
each Downstream Port to begin Link training immediately after its
hot reset completes. This behavior is recommended for all Switches.
* Receiving a hot reset on the Upstream Port.
Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
invokes pcie_portdrv_slot_reset() to restore each port's config space.
At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
section 6.7.3.4 "Software Notification of Hot-Plug Events":
If the Port is enabled for edge-triggered interrupt signaling using
MSI or MSI-X, an interrupt message must be sent every time the logical
AND of the following conditions transitions from FALSE to TRUE: [...]
* The Hot-Plug Interrupt Enable bit in the Slot Control register is
set to 1b.
* At least one hot-plug event status bit in the Slot Status register
and its associated enable bit in the Slot Control register are both
set to 1b.
Prevent pciehp from gratuitously bringing down the slot by clearing the
error-induced Data Link Layer State Changed event before restoring
config space. Afterwards, check whether the link has unexpectedly
failed to retrain and synthesize a DLLSC event if so.
Allow each pcie_port_service_driver (one of them being pciehp) to define
a slot_reset callback and re-use the existing pm_iter() function to
iterate over the callbacks.
Thereby, the Endpoint driver remains bound throughout error recovery and
may restore the device to working state.
Surprise removal during error recovery is detected through a Presence
Detect Changed event. The hotplug port is expected to not signal that
event as a result of a Hot Reset.
The issue isn't DPC-specific, it also occurs when an error is handled by
AER through aer_root_reset(). So while the issue was noticed only now,
it's been around since 2006 when AER support was first introduced.
[bhelgaas: drop PCI_ERROR_RECOVERY Kconfig, split pm_iter() rename to
preparatory patch]
Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmai…
Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
Link: https://lore.kernel.org/r/251f4edcc04c14f873ff1c967bc686169cd07d2d.16276381…
Reported-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Tested-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: stable(a)vger.kernel.org # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel
Cc: Keith Busch <kbusch(a)kernel.org>
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 69fd401691be..918dccbc74b6 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -189,6 +189,8 @@ int pciehp_get_attention_status(struct hotplug_slot *hotplug_slot, u8 *status);
int pciehp_set_raw_indicator_status(struct hotplug_slot *h_slot, u8 status);
int pciehp_get_raw_indicator_status(struct hotplug_slot *h_slot, u8 *status);
+int pciehp_slot_reset(struct pcie_device *dev);
+
static inline const char *slot_name(struct controller *ctrl)
{
return hotplug_slot_name(&ctrl->hotplug_slot);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index ad3393930ecb..f34114d45259 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -351,6 +351,8 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
.runtime_suspend = pciehp_runtime_suspend,
.runtime_resume = pciehp_runtime_resume,
#endif /* PM */
+
+ .slot_reset = pciehp_slot_reset,
};
int __init pcie_hp_init(void)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3024d7e85e6a..83a0fa119cae 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -862,6 +862,32 @@ void pcie_disable_interrupt(struct controller *ctrl)
pcie_write_cmd(ctrl, 0, mask);
}
+/**
+ * pciehp_slot_reset() - ignore link event caused by error-induced hot reset
+ * @dev: PCI Express port service device
+ *
+ * Called from pcie_portdrv_slot_reset() after AER or DPC initiated a reset
+ * further up in the hierarchy to recover from an error. The reset was
+ * propagated down to this hotplug port. Ignore the resulting link flap.
+ * If the link failed to retrain successfully, synthesize the ignored event.
+ * Surprise removal during reset is detected through Presence Detect Changed.
+ */
+int pciehp_slot_reset(struct pcie_device *dev)
+{
+ struct controller *ctrl = get_service_data(dev);
+
+ if (ctrl->state != ON_STATE)
+ return 0;
+
+ pcie_capability_write_word(dev->port, PCI_EXP_SLTSTA,
+ PCI_EXP_SLTSTA_DLLSC);
+
+ if (!pciehp_check_link_active(ctrl))
+ pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
+
+ return 0;
+}
+
/*
* pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
* bus reset of the bridge, but at the same time we want to ensure that it is
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 6126ee4676a7..41fe1ffd5907 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -85,6 +85,8 @@ struct pcie_port_service_driver {
int (*runtime_suspend)(struct pcie_device *dev);
int (*runtime_resume)(struct pcie_device *dev);
+ int (*slot_reset)(struct pcie_device *dev);
+
/* Device driver may resume normal operations */
void (*error_resume)(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index c7ff1eea225a..1af74c3d9d5d 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -160,6 +160,9 @@ static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
{
+ size_t off = offsetof(struct pcie_port_service_driver, slot_reset);
+ device_for_each_child(&dev->dev, &off, pcie_port_device_iter);
+
pci_restore_state(dev);
pci_save_state(dev);
return PCI_ERS_RESULT_RECOVERED;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ea401499e943c307e6d44af6c2b4e068643e7884 Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Sat, 31 Jul 2021 14:39:01 +0200
Subject: [PATCH] PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot
Reset
Stuart Hayes reports that an error handled by DPC at a Root Port results
in pciehp gratuitously bringing down a subordinate hotplug port:
RP -- UP -- DP -- UP -- DP (hotplug) -- EP
pciehp brings the slot down because the Link to the Endpoint goes down.
That is caused by a Hot Reset being propagated as a result of DPC.
Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":
For a Switch, the following must cause a hot reset to be sent on all
Downstream Ports: [...]
* The Data Link Layer of the Upstream Port reporting DL_Down status.
In Switches that support Link speeds greater than 5.0 GT/s, the
Upstream Port must direct the LTSSM of each Downstream Port to the
Hot Reset state, but not hold the LTSSMs in that state. This permits
each Downstream Port to begin Link training immediately after its
hot reset completes. This behavior is recommended for all Switches.
* Receiving a hot reset on the Upstream Port.
Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
invokes pcie_portdrv_slot_reset() to restore each port's config space.
At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
section 6.7.3.4 "Software Notification of Hot-Plug Events":
If the Port is enabled for edge-triggered interrupt signaling using
MSI or MSI-X, an interrupt message must be sent every time the logical
AND of the following conditions transitions from FALSE to TRUE: [...]
* The Hot-Plug Interrupt Enable bit in the Slot Control register is
set to 1b.
* At least one hot-plug event status bit in the Slot Status register
and its associated enable bit in the Slot Control register are both
set to 1b.
Prevent pciehp from gratuitously bringing down the slot by clearing the
error-induced Data Link Layer State Changed event before restoring
config space. Afterwards, check whether the link has unexpectedly
failed to retrain and synthesize a DLLSC event if so.
Allow each pcie_port_service_driver (one of them being pciehp) to define
a slot_reset callback and re-use the existing pm_iter() function to
iterate over the callbacks.
Thereby, the Endpoint driver remains bound throughout error recovery and
may restore the device to working state.
Surprise removal during error recovery is detected through a Presence
Detect Changed event. The hotplug port is expected to not signal that
event as a result of a Hot Reset.
The issue isn't DPC-specific, it also occurs when an error is handled by
AER through aer_root_reset(). So while the issue was noticed only now,
it's been around since 2006 when AER support was first introduced.
[bhelgaas: drop PCI_ERROR_RECOVERY Kconfig, split pm_iter() rename to
preparatory patch]
Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmai…
Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
Link: https://lore.kernel.org/r/251f4edcc04c14f873ff1c967bc686169cd07d2d.16276381…
Reported-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Tested-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: stable(a)vger.kernel.org # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel
Cc: Keith Busch <kbusch(a)kernel.org>
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 69fd401691be..918dccbc74b6 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -189,6 +189,8 @@ int pciehp_get_attention_status(struct hotplug_slot *hotplug_slot, u8 *status);
int pciehp_set_raw_indicator_status(struct hotplug_slot *h_slot, u8 status);
int pciehp_get_raw_indicator_status(struct hotplug_slot *h_slot, u8 *status);
+int pciehp_slot_reset(struct pcie_device *dev);
+
static inline const char *slot_name(struct controller *ctrl)
{
return hotplug_slot_name(&ctrl->hotplug_slot);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index ad3393930ecb..f34114d45259 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -351,6 +351,8 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
.runtime_suspend = pciehp_runtime_suspend,
.runtime_resume = pciehp_runtime_resume,
#endif /* PM */
+
+ .slot_reset = pciehp_slot_reset,
};
int __init pcie_hp_init(void)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3024d7e85e6a..83a0fa119cae 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -862,6 +862,32 @@ void pcie_disable_interrupt(struct controller *ctrl)
pcie_write_cmd(ctrl, 0, mask);
}
+/**
+ * pciehp_slot_reset() - ignore link event caused by error-induced hot reset
+ * @dev: PCI Express port service device
+ *
+ * Called from pcie_portdrv_slot_reset() after AER or DPC initiated a reset
+ * further up in the hierarchy to recover from an error. The reset was
+ * propagated down to this hotplug port. Ignore the resulting link flap.
+ * If the link failed to retrain successfully, synthesize the ignored event.
+ * Surprise removal during reset is detected through Presence Detect Changed.
+ */
+int pciehp_slot_reset(struct pcie_device *dev)
+{
+ struct controller *ctrl = get_service_data(dev);
+
+ if (ctrl->state != ON_STATE)
+ return 0;
+
+ pcie_capability_write_word(dev->port, PCI_EXP_SLTSTA,
+ PCI_EXP_SLTSTA_DLLSC);
+
+ if (!pciehp_check_link_active(ctrl))
+ pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
+
+ return 0;
+}
+
/*
* pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
* bus reset of the bridge, but at the same time we want to ensure that it is
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 6126ee4676a7..41fe1ffd5907 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -85,6 +85,8 @@ struct pcie_port_service_driver {
int (*runtime_suspend)(struct pcie_device *dev);
int (*runtime_resume)(struct pcie_device *dev);
+ int (*slot_reset)(struct pcie_device *dev);
+
/* Device driver may resume normal operations */
void (*error_resume)(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index c7ff1eea225a..1af74c3d9d5d 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -160,6 +160,9 @@ static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
{
+ size_t off = offsetof(struct pcie_port_service_driver, slot_reset);
+ device_for_each_child(&dev->dev, &off, pcie_port_device_iter);
+
pci_restore_state(dev);
pci_save_state(dev);
return PCI_ERS_RESULT_RECOVERED;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ea401499e943c307e6d44af6c2b4e068643e7884 Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Sat, 31 Jul 2021 14:39:01 +0200
Subject: [PATCH] PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot
Reset
Stuart Hayes reports that an error handled by DPC at a Root Port results
in pciehp gratuitously bringing down a subordinate hotplug port:
RP -- UP -- DP -- UP -- DP (hotplug) -- EP
pciehp brings the slot down because the Link to the Endpoint goes down.
That is caused by a Hot Reset being propagated as a result of DPC.
Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":
For a Switch, the following must cause a hot reset to be sent on all
Downstream Ports: [...]
* The Data Link Layer of the Upstream Port reporting DL_Down status.
In Switches that support Link speeds greater than 5.0 GT/s, the
Upstream Port must direct the LTSSM of each Downstream Port to the
Hot Reset state, but not hold the LTSSMs in that state. This permits
each Downstream Port to begin Link training immediately after its
hot reset completes. This behavior is recommended for all Switches.
* Receiving a hot reset on the Upstream Port.
Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
invokes pcie_portdrv_slot_reset() to restore each port's config space.
At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
section 6.7.3.4 "Software Notification of Hot-Plug Events":
If the Port is enabled for edge-triggered interrupt signaling using
MSI or MSI-X, an interrupt message must be sent every time the logical
AND of the following conditions transitions from FALSE to TRUE: [...]
* The Hot-Plug Interrupt Enable bit in the Slot Control register is
set to 1b.
* At least one hot-plug event status bit in the Slot Status register
and its associated enable bit in the Slot Control register are both
set to 1b.
Prevent pciehp from gratuitously bringing down the slot by clearing the
error-induced Data Link Layer State Changed event before restoring
config space. Afterwards, check whether the link has unexpectedly
failed to retrain and synthesize a DLLSC event if so.
Allow each pcie_port_service_driver (one of them being pciehp) to define
a slot_reset callback and re-use the existing pm_iter() function to
iterate over the callbacks.
Thereby, the Endpoint driver remains bound throughout error recovery and
may restore the device to working state.
Surprise removal during error recovery is detected through a Presence
Detect Changed event. The hotplug port is expected to not signal that
event as a result of a Hot Reset.
The issue isn't DPC-specific, it also occurs when an error is handled by
AER through aer_root_reset(). So while the issue was noticed only now,
it's been around since 2006 when AER support was first introduced.
[bhelgaas: drop PCI_ERROR_RECOVERY Kconfig, split pm_iter() rename to
preparatory patch]
Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmai…
Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
Link: https://lore.kernel.org/r/251f4edcc04c14f873ff1c967bc686169cd07d2d.16276381…
Reported-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Tested-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: stable(a)vger.kernel.org # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel
Cc: Keith Busch <kbusch(a)kernel.org>
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 69fd401691be..918dccbc74b6 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -189,6 +189,8 @@ int pciehp_get_attention_status(struct hotplug_slot *hotplug_slot, u8 *status);
int pciehp_set_raw_indicator_status(struct hotplug_slot *h_slot, u8 status);
int pciehp_get_raw_indicator_status(struct hotplug_slot *h_slot, u8 *status);
+int pciehp_slot_reset(struct pcie_device *dev);
+
static inline const char *slot_name(struct controller *ctrl)
{
return hotplug_slot_name(&ctrl->hotplug_slot);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index ad3393930ecb..f34114d45259 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -351,6 +351,8 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
.runtime_suspend = pciehp_runtime_suspend,
.runtime_resume = pciehp_runtime_resume,
#endif /* PM */
+
+ .slot_reset = pciehp_slot_reset,
};
int __init pcie_hp_init(void)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3024d7e85e6a..83a0fa119cae 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -862,6 +862,32 @@ void pcie_disable_interrupt(struct controller *ctrl)
pcie_write_cmd(ctrl, 0, mask);
}
+/**
+ * pciehp_slot_reset() - ignore link event caused by error-induced hot reset
+ * @dev: PCI Express port service device
+ *
+ * Called from pcie_portdrv_slot_reset() after AER or DPC initiated a reset
+ * further up in the hierarchy to recover from an error. The reset was
+ * propagated down to this hotplug port. Ignore the resulting link flap.
+ * If the link failed to retrain successfully, synthesize the ignored event.
+ * Surprise removal during reset is detected through Presence Detect Changed.
+ */
+int pciehp_slot_reset(struct pcie_device *dev)
+{
+ struct controller *ctrl = get_service_data(dev);
+
+ if (ctrl->state != ON_STATE)
+ return 0;
+
+ pcie_capability_write_word(dev->port, PCI_EXP_SLTSTA,
+ PCI_EXP_SLTSTA_DLLSC);
+
+ if (!pciehp_check_link_active(ctrl))
+ pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
+
+ return 0;
+}
+
/*
* pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
* bus reset of the bridge, but at the same time we want to ensure that it is
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 6126ee4676a7..41fe1ffd5907 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -85,6 +85,8 @@ struct pcie_port_service_driver {
int (*runtime_suspend)(struct pcie_device *dev);
int (*runtime_resume)(struct pcie_device *dev);
+ int (*slot_reset)(struct pcie_device *dev);
+
/* Device driver may resume normal operations */
void (*error_resume)(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index c7ff1eea225a..1af74c3d9d5d 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -160,6 +160,9 @@ static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
{
+ size_t off = offsetof(struct pcie_port_service_driver, slot_reset);
+ device_for_each_child(&dev->dev, &off, pcie_port_device_iter);
+
pci_restore_state(dev);
pci_save_state(dev);
return PCI_ERS_RESULT_RECOVERED;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 84e1b4045dc887b78bdc87d92927093dc3a465aa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Thu, 28 Oct 2021 20:56:57 +0200
Subject: [PATCH] PCI: aardvark: Set PCI Bridge Class Code to PCI Bridge
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Aardvark controller has something like config space of a Root Port
available at offset 0x0 of internal registers - these registers are used
for implementation of the emulated bridge.
The default value of Class Code of this bridge corresponds to a RAID Mass
storage controller, though. (This is probably intended for when the
controller is used as Endpoint.)
Change the Class Code to correspond to a PCI Bridge.
Add comment explaining this change.
Link: https://lore.kernel.org/r/20211028185659.20329-6-kabel@kernel.org
Fixes: 8a3ebd8de328 ("PCI: aardvark: Implement emulated root PCI bridge config space")
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Signed-off-by: Marek Behún <kabel(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index d7db03da4d1c..ddca45415c65 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -511,6 +511,26 @@ static void advk_pcie_setup_hw(struct advk_pcie *pcie)
reg = (PCI_VENDOR_ID_MARVELL << 16) | PCI_VENDOR_ID_MARVELL;
advk_writel(pcie, reg, VENDOR_ID_REG);
+ /*
+ * Change Class Code of PCI Bridge device to PCI Bridge (0x600400),
+ * because the default value is Mass storage controller (0x010400).
+ *
+ * Note that this Aardvark PCI Bridge does not have compliant Type 1
+ * Configuration Space and it even cannot be accessed via Aardvark's
+ * PCI config space access method. Something like config space is
+ * available in internal Aardvark registers starting at offset 0x0
+ * and is reported as Type 0. In range 0x10 - 0x34 it has totally
+ * different registers.
+ *
+ * Therefore driver uses emulation of PCI Bridge which emulates
+ * access to configuration space via internal Aardvark registers or
+ * emulated configuration buffer.
+ */
+ reg = advk_readl(pcie, PCIE_CORE_DEV_REV_REG);
+ reg &= ~0xffffff00;
+ reg |= (PCI_CLASS_BRIDGE_PCI << 8) << 8;
+ advk_writel(pcie, reg, PCIE_CORE_DEV_REV_REG);
+
/* Disable Root Bridge I/O space, memory space and bus mastering */
reg = advk_readl(pcie, PCIE_CORE_CMD_STATUS_REG);
reg &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bc4fac42e5f8460af09c0a7f2f1915be09e20c71 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Thu, 28 Oct 2021 20:56:58 +0200
Subject: [PATCH] PCI: aardvark: Fix support for PCI_BRIDGE_CTL_BUS_RESET on
emulated bridge
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Aardvark supports PCIe Hot Reset via PCIE_CORE_CTRL1_REG.
Use it for implementing PCI_BRIDGE_CTL_BUS_RESET bit of PCI_BRIDGE_CONTROL
register on emulated bridge.
With this, the function pci_reset_secondary_bus() starts working and can
reset connected PCIe card. Custom userspace script [1] which uses setpci
can trigger PCIe Hot Reset and reset the card manually.
[1] https://alexforencich.com/wiki/en/pcie/hot-reset-linux
Link: https://lore.kernel.org/r/20211028185659.20329-7-kabel@kernel.org
Fixes: 8a3ebd8de328 ("PCI: aardvark: Implement emulated root PCI bridge config space")
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Signed-off-by: Marek Behún <kabel(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index ddca45415c65..c3b725afa11f 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -773,6 +773,22 @@ advk_pci_bridge_emul_base_conf_read(struct pci_bridge_emul *bridge,
*value = advk_readl(pcie, PCIE_CORE_CMD_STATUS_REG);
return PCI_BRIDGE_EMUL_HANDLED;
+ case PCI_INTERRUPT_LINE: {
+ /*
+ * From the whole 32bit register we support reading from HW only
+ * one bit: PCI_BRIDGE_CTL_BUS_RESET.
+ * Other bits are retrieved only from emulated config buffer.
+ */
+ __le32 *cfgspace = (__le32 *)&bridge->conf;
+ u32 val = le32_to_cpu(cfgspace[PCI_INTERRUPT_LINE / 4]);
+ if (advk_readl(pcie, PCIE_CORE_CTRL1_REG) & HOT_RESET_GEN)
+ val |= PCI_BRIDGE_CTL_BUS_RESET << 16;
+ else
+ val &= ~(PCI_BRIDGE_CTL_BUS_RESET << 16);
+ *value = val;
+ return PCI_BRIDGE_EMUL_HANDLED;
+ }
+
default:
return PCI_BRIDGE_EMUL_NOT_HANDLED;
}
@@ -789,6 +805,17 @@ advk_pci_bridge_emul_base_conf_write(struct pci_bridge_emul *bridge,
advk_writel(pcie, new, PCIE_CORE_CMD_STATUS_REG);
break;
+ case PCI_INTERRUPT_LINE:
+ if (mask & (PCI_BRIDGE_CTL_BUS_RESET << 16)) {
+ u32 val = advk_readl(pcie, PCIE_CORE_CTRL1_REG);
+ if (new & (PCI_BRIDGE_CTL_BUS_RESET << 16))
+ val |= HOT_RESET_GEN;
+ else
+ val &= ~HOT_RESET_GEN;
+ advk_writel(pcie, val, PCIE_CORE_CTRL1_REG);
+ }
+ break;
+
default:
break;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 771153fc884f566a89af2d30033b7f3bc6e24e84 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Thu, 28 Oct 2021 20:56:56 +0200
Subject: [PATCH] PCI: aardvark: Fix support for bus mastering and PCI_COMMAND
on emulated bridge
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
>From very vague, ambiguous and incomplete information from Marvell we
deduced that the 32-bit Aardvark register at address 0x4
(PCIE_CORE_CMD_STATUS_REG), which is not documented for Root Complex mode
in the Functional Specification (only for Endpoint mode), controls two
16-bit PCIe registers: Command Register and Status Registers of PCIe Root
Port.
This means that bit 2 controls bus mastering and forwarding of memory and
I/O requests in the upstream direction. According to PCI specifications
bits [0:2] of Command Register, this should be by default disabled on
reset. So explicitly disable these bits at early setup of the Aardvark
driver.
Remove code which unconditionally enables all 3 bits and let kernel code
(via pci_set_master() function) to handle bus mastering of Root PCIe
Bridge via emulated PCI_COMMAND on emulated bridge.
Link: https://lore.kernel.org/r/20211028185659.20329-5-kabel@kernel.org
Fixes: 8a3ebd8de328 ("PCI: aardvark: Implement emulated root PCI bridge config space")
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Signed-off-by: Marek Behún <kabel(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Cc: stable(a)vger.kernel.org # b2a56469d550 ("PCI: aardvark: Add FIXME comment for PCIE_CORE_CMD_STATUS_REG access")
diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index 389ebba1dd9b..d7db03da4d1c 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -31,9 +31,6 @@
/* PCIe core registers */
#define PCIE_CORE_DEV_ID_REG 0x0
#define PCIE_CORE_CMD_STATUS_REG 0x4
-#define PCIE_CORE_CMD_IO_ACCESS_EN BIT(0)
-#define PCIE_CORE_CMD_MEM_ACCESS_EN BIT(1)
-#define PCIE_CORE_CMD_MEM_IO_REQ_EN BIT(2)
#define PCIE_CORE_DEV_REV_REG 0x8
#define PCIE_CORE_PCIEXP_CAP 0xc0
#define PCIE_CORE_ERR_CAPCTL_REG 0x118
@@ -514,6 +511,11 @@ static void advk_pcie_setup_hw(struct advk_pcie *pcie)
reg = (PCI_VENDOR_ID_MARVELL << 16) | PCI_VENDOR_ID_MARVELL;
advk_writel(pcie, reg, VENDOR_ID_REG);
+ /* Disable Root Bridge I/O space, memory space and bus mastering */
+ reg = advk_readl(pcie, PCIE_CORE_CMD_STATUS_REG);
+ reg &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER);
+ advk_writel(pcie, reg, PCIE_CORE_CMD_STATUS_REG);
+
/* Set Advanced Error Capabilities and Control PF0 register */
reg = PCIE_CORE_ERR_CAPCTL_ECRC_CHK_TX |
PCIE_CORE_ERR_CAPCTL_ECRC_CHK_TX_EN |
@@ -612,19 +614,6 @@ static void advk_pcie_setup_hw(struct advk_pcie *pcie)
advk_pcie_disable_ob_win(pcie, i);
advk_pcie_train_link(pcie);
-
- /*
- * FIXME: The following register update is suspicious. This register is
- * applicable only when the PCI controller is configured for Endpoint
- * mode, not as a Root Complex. But apparently when this code is
- * removed, some cards stop working. This should be investigated and
- * a comment explaining this should be put here.
- */
- reg = advk_readl(pcie, PCIE_CORE_CMD_STATUS_REG);
- reg |= PCIE_CORE_CMD_MEM_ACCESS_EN |
- PCIE_CORE_CMD_IO_ACCESS_EN |
- PCIE_CORE_CMD_MEM_IO_REQ_EN;
- advk_writel(pcie, reg, PCIE_CORE_CMD_STATUS_REG);
}
static int advk_pcie_check_pio_status(struct advk_pcie *pcie, bool allow_crs, u32 *val)
@@ -753,6 +742,37 @@ static int advk_pcie_wait_pio(struct advk_pcie *pcie)
return -ETIMEDOUT;
}
+static pci_bridge_emul_read_status_t
+advk_pci_bridge_emul_base_conf_read(struct pci_bridge_emul *bridge,
+ int reg, u32 *value)
+{
+ struct advk_pcie *pcie = bridge->data;
+
+ switch (reg) {
+ case PCI_COMMAND:
+ *value = advk_readl(pcie, PCIE_CORE_CMD_STATUS_REG);
+ return PCI_BRIDGE_EMUL_HANDLED;
+
+ default:
+ return PCI_BRIDGE_EMUL_NOT_HANDLED;
+ }
+}
+
+static void
+advk_pci_bridge_emul_base_conf_write(struct pci_bridge_emul *bridge,
+ int reg, u32 old, u32 new, u32 mask)
+{
+ struct advk_pcie *pcie = bridge->data;
+
+ switch (reg) {
+ case PCI_COMMAND:
+ advk_writel(pcie, new, PCIE_CORE_CMD_STATUS_REG);
+ break;
+
+ default:
+ break;
+ }
+}
static pci_bridge_emul_read_status_t
advk_pci_bridge_emul_pcie_conf_read(struct pci_bridge_emul *bridge,
@@ -854,6 +874,8 @@ advk_pci_bridge_emul_pcie_conf_write(struct pci_bridge_emul *bridge,
}
static struct pci_bridge_emul_ops advk_pci_bridge_emul_ops = {
+ .read_base = advk_pci_bridge_emul_base_conf_read,
+ .write_base = advk_pci_bridge_emul_base_conf_write,
.read_pcie = advk_pci_bridge_emul_pcie_conf_read,
.write_pcie = advk_pci_bridge_emul_pcie_conf_write,
};
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 771153fc884f566a89af2d30033b7f3bc6e24e84 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Thu, 28 Oct 2021 20:56:56 +0200
Subject: [PATCH] PCI: aardvark: Fix support for bus mastering and PCI_COMMAND
on emulated bridge
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
>From very vague, ambiguous and incomplete information from Marvell we
deduced that the 32-bit Aardvark register at address 0x4
(PCIE_CORE_CMD_STATUS_REG), which is not documented for Root Complex mode
in the Functional Specification (only for Endpoint mode), controls two
16-bit PCIe registers: Command Register and Status Registers of PCIe Root
Port.
This means that bit 2 controls bus mastering and forwarding of memory and
I/O requests in the upstream direction. According to PCI specifications
bits [0:2] of Command Register, this should be by default disabled on
reset. So explicitly disable these bits at early setup of the Aardvark
driver.
Remove code which unconditionally enables all 3 bits and let kernel code
(via pci_set_master() function) to handle bus mastering of Root PCIe
Bridge via emulated PCI_COMMAND on emulated bridge.
Link: https://lore.kernel.org/r/20211028185659.20329-5-kabel@kernel.org
Fixes: 8a3ebd8de328 ("PCI: aardvark: Implement emulated root PCI bridge config space")
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Signed-off-by: Marek Behún <kabel(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Cc: stable(a)vger.kernel.org # b2a56469d550 ("PCI: aardvark: Add FIXME comment for PCIE_CORE_CMD_STATUS_REG access")
diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index 389ebba1dd9b..d7db03da4d1c 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -31,9 +31,6 @@
/* PCIe core registers */
#define PCIE_CORE_DEV_ID_REG 0x0
#define PCIE_CORE_CMD_STATUS_REG 0x4
-#define PCIE_CORE_CMD_IO_ACCESS_EN BIT(0)
-#define PCIE_CORE_CMD_MEM_ACCESS_EN BIT(1)
-#define PCIE_CORE_CMD_MEM_IO_REQ_EN BIT(2)
#define PCIE_CORE_DEV_REV_REG 0x8
#define PCIE_CORE_PCIEXP_CAP 0xc0
#define PCIE_CORE_ERR_CAPCTL_REG 0x118
@@ -514,6 +511,11 @@ static void advk_pcie_setup_hw(struct advk_pcie *pcie)
reg = (PCI_VENDOR_ID_MARVELL << 16) | PCI_VENDOR_ID_MARVELL;
advk_writel(pcie, reg, VENDOR_ID_REG);
+ /* Disable Root Bridge I/O space, memory space and bus mastering */
+ reg = advk_readl(pcie, PCIE_CORE_CMD_STATUS_REG);
+ reg &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER);
+ advk_writel(pcie, reg, PCIE_CORE_CMD_STATUS_REG);
+
/* Set Advanced Error Capabilities and Control PF0 register */
reg = PCIE_CORE_ERR_CAPCTL_ECRC_CHK_TX |
PCIE_CORE_ERR_CAPCTL_ECRC_CHK_TX_EN |
@@ -612,19 +614,6 @@ static void advk_pcie_setup_hw(struct advk_pcie *pcie)
advk_pcie_disable_ob_win(pcie, i);
advk_pcie_train_link(pcie);
-
- /*
- * FIXME: The following register update is suspicious. This register is
- * applicable only when the PCI controller is configured for Endpoint
- * mode, not as a Root Complex. But apparently when this code is
- * removed, some cards stop working. This should be investigated and
- * a comment explaining this should be put here.
- */
- reg = advk_readl(pcie, PCIE_CORE_CMD_STATUS_REG);
- reg |= PCIE_CORE_CMD_MEM_ACCESS_EN |
- PCIE_CORE_CMD_IO_ACCESS_EN |
- PCIE_CORE_CMD_MEM_IO_REQ_EN;
- advk_writel(pcie, reg, PCIE_CORE_CMD_STATUS_REG);
}
static int advk_pcie_check_pio_status(struct advk_pcie *pcie, bool allow_crs, u32 *val)
@@ -753,6 +742,37 @@ static int advk_pcie_wait_pio(struct advk_pcie *pcie)
return -ETIMEDOUT;
}
+static pci_bridge_emul_read_status_t
+advk_pci_bridge_emul_base_conf_read(struct pci_bridge_emul *bridge,
+ int reg, u32 *value)
+{
+ struct advk_pcie *pcie = bridge->data;
+
+ switch (reg) {
+ case PCI_COMMAND:
+ *value = advk_readl(pcie, PCIE_CORE_CMD_STATUS_REG);
+ return PCI_BRIDGE_EMUL_HANDLED;
+
+ default:
+ return PCI_BRIDGE_EMUL_NOT_HANDLED;
+ }
+}
+
+static void
+advk_pci_bridge_emul_base_conf_write(struct pci_bridge_emul *bridge,
+ int reg, u32 old, u32 new, u32 mask)
+{
+ struct advk_pcie *pcie = bridge->data;
+
+ switch (reg) {
+ case PCI_COMMAND:
+ advk_writel(pcie, new, PCIE_CORE_CMD_STATUS_REG);
+ break;
+
+ default:
+ break;
+ }
+}
static pci_bridge_emul_read_status_t
advk_pci_bridge_emul_pcie_conf_read(struct pci_bridge_emul *bridge,
@@ -854,6 +874,8 @@ advk_pci_bridge_emul_pcie_conf_write(struct pci_bridge_emul *bridge,
}
static struct pci_bridge_emul_ops advk_pci_bridge_emul_ops = {
+ .read_base = advk_pci_bridge_emul_base_conf_read,
+ .write_base = advk_pci_bridge_emul_base_conf_write,
.read_pcie = advk_pci_bridge_emul_pcie_conf_read,
.write_pcie = advk_pci_bridge_emul_pcie_conf_write,
};
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a4e17d65dafdd3513042d8f00404c9b6068a825c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Tue, 5 Oct 2021 20:09:41 +0200
Subject: [PATCH] PCI: aardvark: Fix PCIe Max Payload Size setting
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Change PCIe Max Payload Size setting in PCIe Device Control register to 512
bytes to align with PCIe Link Initialization sequence as defined in Marvell
Armada 3700 Functional Specification. According to the specification,
maximal Max Payload Size supported by this device is 512 bytes.
Without this kernel prints suspicious line:
pci 0000:01:00.0: Upstream bridge's Max Payload Size set to 256 (was 16384, max 512)
With this change it changes to:
pci 0000:01:00.0: Upstream bridge's Max Payload Size set to 256 (was 512, max 512)
Link: https://lore.kernel.org/r/20211005180952.6812-3-kabel@kernel.org
Fixes: 8c39d710363c ("PCI: aardvark: Add Aardvark PCI host controller driver")
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Signed-off-by: Marek Behún <kabel(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Reviewed-by: Marek Behún <kabel(a)kernel.org>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index 596ebcfcc82d..884510630bae 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -488,8 +488,9 @@ static void advk_pcie_setup_hw(struct advk_pcie *pcie)
reg = advk_readl(pcie, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL);
reg &= ~PCI_EXP_DEVCTL_RELAX_EN;
reg &= ~PCI_EXP_DEVCTL_NOSNOOP_EN;
+ reg &= ~PCI_EXP_DEVCTL_PAYLOAD;
reg &= ~PCI_EXP_DEVCTL_READRQ;
- reg |= PCI_EXP_DEVCTL_PAYLOAD; /* Set max payload size */
+ reg |= PCI_EXP_DEVCTL_PAYLOAD_512B;
reg |= PCI_EXP_DEVCTL_READRQ_512B;
advk_writel(pcie, reg, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a4e17d65dafdd3513042d8f00404c9b6068a825c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Tue, 5 Oct 2021 20:09:41 +0200
Subject: [PATCH] PCI: aardvark: Fix PCIe Max Payload Size setting
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Change PCIe Max Payload Size setting in PCIe Device Control register to 512
bytes to align with PCIe Link Initialization sequence as defined in Marvell
Armada 3700 Functional Specification. According to the specification,
maximal Max Payload Size supported by this device is 512 bytes.
Without this kernel prints suspicious line:
pci 0000:01:00.0: Upstream bridge's Max Payload Size set to 256 (was 16384, max 512)
With this change it changes to:
pci 0000:01:00.0: Upstream bridge's Max Payload Size set to 256 (was 512, max 512)
Link: https://lore.kernel.org/r/20211005180952.6812-3-kabel@kernel.org
Fixes: 8c39d710363c ("PCI: aardvark: Add Aardvark PCI host controller driver")
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Signed-off-by: Marek Behún <kabel(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Reviewed-by: Marek Behún <kabel(a)kernel.org>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index 596ebcfcc82d..884510630bae 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -488,8 +488,9 @@ static void advk_pcie_setup_hw(struct advk_pcie *pcie)
reg = advk_readl(pcie, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL);
reg &= ~PCI_EXP_DEVCTL_RELAX_EN;
reg &= ~PCI_EXP_DEVCTL_NOSNOOP_EN;
+ reg &= ~PCI_EXP_DEVCTL_PAYLOAD;
reg &= ~PCI_EXP_DEVCTL_READRQ;
- reg |= PCI_EXP_DEVCTL_PAYLOAD; /* Set max payload size */
+ reg |= PCI_EXP_DEVCTL_PAYLOAD_512B;
reg |= PCI_EXP_DEVCTL_READRQ_512B;
advk_writel(pcie, reg, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a4e17d65dafdd3513042d8f00404c9b6068a825c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Tue, 5 Oct 2021 20:09:41 +0200
Subject: [PATCH] PCI: aardvark: Fix PCIe Max Payload Size setting
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Change PCIe Max Payload Size setting in PCIe Device Control register to 512
bytes to align with PCIe Link Initialization sequence as defined in Marvell
Armada 3700 Functional Specification. According to the specification,
maximal Max Payload Size supported by this device is 512 bytes.
Without this kernel prints suspicious line:
pci 0000:01:00.0: Upstream bridge's Max Payload Size set to 256 (was 16384, max 512)
With this change it changes to:
pci 0000:01:00.0: Upstream bridge's Max Payload Size set to 256 (was 512, max 512)
Link: https://lore.kernel.org/r/20211005180952.6812-3-kabel@kernel.org
Fixes: 8c39d710363c ("PCI: aardvark: Add Aardvark PCI host controller driver")
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Signed-off-by: Marek Behún <kabel(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Reviewed-by: Marek Behún <kabel(a)kernel.org>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index 596ebcfcc82d..884510630bae 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -488,8 +488,9 @@ static void advk_pcie_setup_hw(struct advk_pcie *pcie)
reg = advk_readl(pcie, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL);
reg &= ~PCI_EXP_DEVCTL_RELAX_EN;
reg &= ~PCI_EXP_DEVCTL_NOSNOOP_EN;
+ reg &= ~PCI_EXP_DEVCTL_PAYLOAD;
reg &= ~PCI_EXP_DEVCTL_READRQ;
- reg |= PCI_EXP_DEVCTL_PAYLOAD; /* Set max payload size */
+ reg |= PCI_EXP_DEVCTL_PAYLOAD_512B;
reg |= PCI_EXP_DEVCTL_READRQ_512B;
advk_writel(pcie, reg, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL);
11/14/2021 01:47:31 pm
Good day,
I sent an earlier email to stable(a)vger.kernel.org with all the details, did you see it or do I resend it again?
Regards,
Juaquin Dabeer
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ea401499e943c307e6d44af6c2b4e068643e7884 Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Sat, 31 Jul 2021 14:39:01 +0200
Subject: [PATCH] PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot
Reset
Stuart Hayes reports that an error handled by DPC at a Root Port results
in pciehp gratuitously bringing down a subordinate hotplug port:
RP -- UP -- DP -- UP -- DP (hotplug) -- EP
pciehp brings the slot down because the Link to the Endpoint goes down.
That is caused by a Hot Reset being propagated as a result of DPC.
Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":
For a Switch, the following must cause a hot reset to be sent on all
Downstream Ports: [...]
* The Data Link Layer of the Upstream Port reporting DL_Down status.
In Switches that support Link speeds greater than 5.0 GT/s, the
Upstream Port must direct the LTSSM of each Downstream Port to the
Hot Reset state, but not hold the LTSSMs in that state. This permits
each Downstream Port to begin Link training immediately after its
hot reset completes. This behavior is recommended for all Switches.
* Receiving a hot reset on the Upstream Port.
Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
invokes pcie_portdrv_slot_reset() to restore each port's config space.
At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
section 6.7.3.4 "Software Notification of Hot-Plug Events":
If the Port is enabled for edge-triggered interrupt signaling using
MSI or MSI-X, an interrupt message must be sent every time the logical
AND of the following conditions transitions from FALSE to TRUE: [...]
* The Hot-Plug Interrupt Enable bit in the Slot Control register is
set to 1b.
* At least one hot-plug event status bit in the Slot Status register
and its associated enable bit in the Slot Control register are both
set to 1b.
Prevent pciehp from gratuitously bringing down the slot by clearing the
error-induced Data Link Layer State Changed event before restoring
config space. Afterwards, check whether the link has unexpectedly
failed to retrain and synthesize a DLLSC event if so.
Allow each pcie_port_service_driver (one of them being pciehp) to define
a slot_reset callback and re-use the existing pm_iter() function to
iterate over the callbacks.
Thereby, the Endpoint driver remains bound throughout error recovery and
may restore the device to working state.
Surprise removal during error recovery is detected through a Presence
Detect Changed event. The hotplug port is expected to not signal that
event as a result of a Hot Reset.
The issue isn't DPC-specific, it also occurs when an error is handled by
AER through aer_root_reset(). So while the issue was noticed only now,
it's been around since 2006 when AER support was first introduced.
[bhelgaas: drop PCI_ERROR_RECOVERY Kconfig, split pm_iter() rename to
preparatory patch]
Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmai…
Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
Link: https://lore.kernel.org/r/251f4edcc04c14f873ff1c967bc686169cd07d2d.16276381…
Reported-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Tested-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: stable(a)vger.kernel.org # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel
Cc: Keith Busch <kbusch(a)kernel.org>
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 69fd401691be..918dccbc74b6 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -189,6 +189,8 @@ int pciehp_get_attention_status(struct hotplug_slot *hotplug_slot, u8 *status);
int pciehp_set_raw_indicator_status(struct hotplug_slot *h_slot, u8 status);
int pciehp_get_raw_indicator_status(struct hotplug_slot *h_slot, u8 *status);
+int pciehp_slot_reset(struct pcie_device *dev);
+
static inline const char *slot_name(struct controller *ctrl)
{
return hotplug_slot_name(&ctrl->hotplug_slot);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index ad3393930ecb..f34114d45259 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -351,6 +351,8 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
.runtime_suspend = pciehp_runtime_suspend,
.runtime_resume = pciehp_runtime_resume,
#endif /* PM */
+
+ .slot_reset = pciehp_slot_reset,
};
int __init pcie_hp_init(void)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3024d7e85e6a..83a0fa119cae 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -862,6 +862,32 @@ void pcie_disable_interrupt(struct controller *ctrl)
pcie_write_cmd(ctrl, 0, mask);
}
+/**
+ * pciehp_slot_reset() - ignore link event caused by error-induced hot reset
+ * @dev: PCI Express port service device
+ *
+ * Called from pcie_portdrv_slot_reset() after AER or DPC initiated a reset
+ * further up in the hierarchy to recover from an error. The reset was
+ * propagated down to this hotplug port. Ignore the resulting link flap.
+ * If the link failed to retrain successfully, synthesize the ignored event.
+ * Surprise removal during reset is detected through Presence Detect Changed.
+ */
+int pciehp_slot_reset(struct pcie_device *dev)
+{
+ struct controller *ctrl = get_service_data(dev);
+
+ if (ctrl->state != ON_STATE)
+ return 0;
+
+ pcie_capability_write_word(dev->port, PCI_EXP_SLTSTA,
+ PCI_EXP_SLTSTA_DLLSC);
+
+ if (!pciehp_check_link_active(ctrl))
+ pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
+
+ return 0;
+}
+
/*
* pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
* bus reset of the bridge, but at the same time we want to ensure that it is
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 6126ee4676a7..41fe1ffd5907 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -85,6 +85,8 @@ struct pcie_port_service_driver {
int (*runtime_suspend)(struct pcie_device *dev);
int (*runtime_resume)(struct pcie_device *dev);
+ int (*slot_reset)(struct pcie_device *dev);
+
/* Device driver may resume normal operations */
void (*error_resume)(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index c7ff1eea225a..1af74c3d9d5d 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -160,6 +160,9 @@ static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
{
+ size_t off = offsetof(struct pcie_port_service_driver, slot_reset);
+ device_for_each_child(&dev->dev, &off, pcie_port_device_iter);
+
pci_restore_state(dev);
pci_save_state(dev);
return PCI_ERS_RESULT_RECOVERED;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ea401499e943c307e6d44af6c2b4e068643e7884 Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Sat, 31 Jul 2021 14:39:01 +0200
Subject: [PATCH] PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot
Reset
Stuart Hayes reports that an error handled by DPC at a Root Port results
in pciehp gratuitously bringing down a subordinate hotplug port:
RP -- UP -- DP -- UP -- DP (hotplug) -- EP
pciehp brings the slot down because the Link to the Endpoint goes down.
That is caused by a Hot Reset being propagated as a result of DPC.
Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":
For a Switch, the following must cause a hot reset to be sent on all
Downstream Ports: [...]
* The Data Link Layer of the Upstream Port reporting DL_Down status.
In Switches that support Link speeds greater than 5.0 GT/s, the
Upstream Port must direct the LTSSM of each Downstream Port to the
Hot Reset state, but not hold the LTSSMs in that state. This permits
each Downstream Port to begin Link training immediately after its
hot reset completes. This behavior is recommended for all Switches.
* Receiving a hot reset on the Upstream Port.
Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
invokes pcie_portdrv_slot_reset() to restore each port's config space.
At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
section 6.7.3.4 "Software Notification of Hot-Plug Events":
If the Port is enabled for edge-triggered interrupt signaling using
MSI or MSI-X, an interrupt message must be sent every time the logical
AND of the following conditions transitions from FALSE to TRUE: [...]
* The Hot-Plug Interrupt Enable bit in the Slot Control register is
set to 1b.
* At least one hot-plug event status bit in the Slot Status register
and its associated enable bit in the Slot Control register are both
set to 1b.
Prevent pciehp from gratuitously bringing down the slot by clearing the
error-induced Data Link Layer State Changed event before restoring
config space. Afterwards, check whether the link has unexpectedly
failed to retrain and synthesize a DLLSC event if so.
Allow each pcie_port_service_driver (one of them being pciehp) to define
a slot_reset callback and re-use the existing pm_iter() function to
iterate over the callbacks.
Thereby, the Endpoint driver remains bound throughout error recovery and
may restore the device to working state.
Surprise removal during error recovery is detected through a Presence
Detect Changed event. The hotplug port is expected to not signal that
event as a result of a Hot Reset.
The issue isn't DPC-specific, it also occurs when an error is handled by
AER through aer_root_reset(). So while the issue was noticed only now,
it's been around since 2006 when AER support was first introduced.
[bhelgaas: drop PCI_ERROR_RECOVERY Kconfig, split pm_iter() rename to
preparatory patch]
Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmai…
Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
Link: https://lore.kernel.org/r/251f4edcc04c14f873ff1c967bc686169cd07d2d.16276381…
Reported-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Tested-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: stable(a)vger.kernel.org # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel
Cc: Keith Busch <kbusch(a)kernel.org>
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 69fd401691be..918dccbc74b6 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -189,6 +189,8 @@ int pciehp_get_attention_status(struct hotplug_slot *hotplug_slot, u8 *status);
int pciehp_set_raw_indicator_status(struct hotplug_slot *h_slot, u8 status);
int pciehp_get_raw_indicator_status(struct hotplug_slot *h_slot, u8 *status);
+int pciehp_slot_reset(struct pcie_device *dev);
+
static inline const char *slot_name(struct controller *ctrl)
{
return hotplug_slot_name(&ctrl->hotplug_slot);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index ad3393930ecb..f34114d45259 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -351,6 +351,8 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
.runtime_suspend = pciehp_runtime_suspend,
.runtime_resume = pciehp_runtime_resume,
#endif /* PM */
+
+ .slot_reset = pciehp_slot_reset,
};
int __init pcie_hp_init(void)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3024d7e85e6a..83a0fa119cae 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -862,6 +862,32 @@ void pcie_disable_interrupt(struct controller *ctrl)
pcie_write_cmd(ctrl, 0, mask);
}
+/**
+ * pciehp_slot_reset() - ignore link event caused by error-induced hot reset
+ * @dev: PCI Express port service device
+ *
+ * Called from pcie_portdrv_slot_reset() after AER or DPC initiated a reset
+ * further up in the hierarchy to recover from an error. The reset was
+ * propagated down to this hotplug port. Ignore the resulting link flap.
+ * If the link failed to retrain successfully, synthesize the ignored event.
+ * Surprise removal during reset is detected through Presence Detect Changed.
+ */
+int pciehp_slot_reset(struct pcie_device *dev)
+{
+ struct controller *ctrl = get_service_data(dev);
+
+ if (ctrl->state != ON_STATE)
+ return 0;
+
+ pcie_capability_write_word(dev->port, PCI_EXP_SLTSTA,
+ PCI_EXP_SLTSTA_DLLSC);
+
+ if (!pciehp_check_link_active(ctrl))
+ pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
+
+ return 0;
+}
+
/*
* pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
* bus reset of the bridge, but at the same time we want to ensure that it is
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 6126ee4676a7..41fe1ffd5907 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -85,6 +85,8 @@ struct pcie_port_service_driver {
int (*runtime_suspend)(struct pcie_device *dev);
int (*runtime_resume)(struct pcie_device *dev);
+ int (*slot_reset)(struct pcie_device *dev);
+
/* Device driver may resume normal operations */
void (*error_resume)(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index c7ff1eea225a..1af74c3d9d5d 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -160,6 +160,9 @@ static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
{
+ size_t off = offsetof(struct pcie_port_service_driver, slot_reset);
+ device_for_each_child(&dev->dev, &off, pcie_port_device_iter);
+
pci_restore_state(dev);
pci_save_state(dev);
return PCI_ERS_RESULT_RECOVERED;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ea401499e943c307e6d44af6c2b4e068643e7884 Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Sat, 31 Jul 2021 14:39:01 +0200
Subject: [PATCH] PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot
Reset
Stuart Hayes reports that an error handled by DPC at a Root Port results
in pciehp gratuitously bringing down a subordinate hotplug port:
RP -- UP -- DP -- UP -- DP (hotplug) -- EP
pciehp brings the slot down because the Link to the Endpoint goes down.
That is caused by a Hot Reset being propagated as a result of DPC.
Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":
For a Switch, the following must cause a hot reset to be sent on all
Downstream Ports: [...]
* The Data Link Layer of the Upstream Port reporting DL_Down status.
In Switches that support Link speeds greater than 5.0 GT/s, the
Upstream Port must direct the LTSSM of each Downstream Port to the
Hot Reset state, but not hold the LTSSMs in that state. This permits
each Downstream Port to begin Link training immediately after its
hot reset completes. This behavior is recommended for all Switches.
* Receiving a hot reset on the Upstream Port.
Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
invokes pcie_portdrv_slot_reset() to restore each port's config space.
At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
section 6.7.3.4 "Software Notification of Hot-Plug Events":
If the Port is enabled for edge-triggered interrupt signaling using
MSI or MSI-X, an interrupt message must be sent every time the logical
AND of the following conditions transitions from FALSE to TRUE: [...]
* The Hot-Plug Interrupt Enable bit in the Slot Control register is
set to 1b.
* At least one hot-plug event status bit in the Slot Status register
and its associated enable bit in the Slot Control register are both
set to 1b.
Prevent pciehp from gratuitously bringing down the slot by clearing the
error-induced Data Link Layer State Changed event before restoring
config space. Afterwards, check whether the link has unexpectedly
failed to retrain and synthesize a DLLSC event if so.
Allow each pcie_port_service_driver (one of them being pciehp) to define
a slot_reset callback and re-use the existing pm_iter() function to
iterate over the callbacks.
Thereby, the Endpoint driver remains bound throughout error recovery and
may restore the device to working state.
Surprise removal during error recovery is detected through a Presence
Detect Changed event. The hotplug port is expected to not signal that
event as a result of a Hot Reset.
The issue isn't DPC-specific, it also occurs when an error is handled by
AER through aer_root_reset(). So while the issue was noticed only now,
it's been around since 2006 when AER support was first introduced.
[bhelgaas: drop PCI_ERROR_RECOVERY Kconfig, split pm_iter() rename to
preparatory patch]
Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmai…
Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
Link: https://lore.kernel.org/r/251f4edcc04c14f873ff1c967bc686169cd07d2d.16276381…
Reported-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Tested-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: stable(a)vger.kernel.org # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel
Cc: Keith Busch <kbusch(a)kernel.org>
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 69fd401691be..918dccbc74b6 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -189,6 +189,8 @@ int pciehp_get_attention_status(struct hotplug_slot *hotplug_slot, u8 *status);
int pciehp_set_raw_indicator_status(struct hotplug_slot *h_slot, u8 status);
int pciehp_get_raw_indicator_status(struct hotplug_slot *h_slot, u8 *status);
+int pciehp_slot_reset(struct pcie_device *dev);
+
static inline const char *slot_name(struct controller *ctrl)
{
return hotplug_slot_name(&ctrl->hotplug_slot);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index ad3393930ecb..f34114d45259 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -351,6 +351,8 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
.runtime_suspend = pciehp_runtime_suspend,
.runtime_resume = pciehp_runtime_resume,
#endif /* PM */
+
+ .slot_reset = pciehp_slot_reset,
};
int __init pcie_hp_init(void)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3024d7e85e6a..83a0fa119cae 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -862,6 +862,32 @@ void pcie_disable_interrupt(struct controller *ctrl)
pcie_write_cmd(ctrl, 0, mask);
}
+/**
+ * pciehp_slot_reset() - ignore link event caused by error-induced hot reset
+ * @dev: PCI Express port service device
+ *
+ * Called from pcie_portdrv_slot_reset() after AER or DPC initiated a reset
+ * further up in the hierarchy to recover from an error. The reset was
+ * propagated down to this hotplug port. Ignore the resulting link flap.
+ * If the link failed to retrain successfully, synthesize the ignored event.
+ * Surprise removal during reset is detected through Presence Detect Changed.
+ */
+int pciehp_slot_reset(struct pcie_device *dev)
+{
+ struct controller *ctrl = get_service_data(dev);
+
+ if (ctrl->state != ON_STATE)
+ return 0;
+
+ pcie_capability_write_word(dev->port, PCI_EXP_SLTSTA,
+ PCI_EXP_SLTSTA_DLLSC);
+
+ if (!pciehp_check_link_active(ctrl))
+ pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
+
+ return 0;
+}
+
/*
* pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
* bus reset of the bridge, but at the same time we want to ensure that it is
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 6126ee4676a7..41fe1ffd5907 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -85,6 +85,8 @@ struct pcie_port_service_driver {
int (*runtime_suspend)(struct pcie_device *dev);
int (*runtime_resume)(struct pcie_device *dev);
+ int (*slot_reset)(struct pcie_device *dev);
+
/* Device driver may resume normal operations */
void (*error_resume)(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index c7ff1eea225a..1af74c3d9d5d 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -160,6 +160,9 @@ static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
{
+ size_t off = offsetof(struct pcie_port_service_driver, slot_reset);
+ device_for_each_child(&dev->dev, &off, pcie_port_device_iter);
+
pci_restore_state(dev);
pci_save_state(dev);
return PCI_ERS_RESULT_RECOVERED;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ea401499e943c307e6d44af6c2b4e068643e7884 Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Sat, 31 Jul 2021 14:39:01 +0200
Subject: [PATCH] PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot
Reset
Stuart Hayes reports that an error handled by DPC at a Root Port results
in pciehp gratuitously bringing down a subordinate hotplug port:
RP -- UP -- DP -- UP -- DP (hotplug) -- EP
pciehp brings the slot down because the Link to the Endpoint goes down.
That is caused by a Hot Reset being propagated as a result of DPC.
Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":
For a Switch, the following must cause a hot reset to be sent on all
Downstream Ports: [...]
* The Data Link Layer of the Upstream Port reporting DL_Down status.
In Switches that support Link speeds greater than 5.0 GT/s, the
Upstream Port must direct the LTSSM of each Downstream Port to the
Hot Reset state, but not hold the LTSSMs in that state. This permits
each Downstream Port to begin Link training immediately after its
hot reset completes. This behavior is recommended for all Switches.
* Receiving a hot reset on the Upstream Port.
Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
invokes pcie_portdrv_slot_reset() to restore each port's config space.
At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
section 6.7.3.4 "Software Notification of Hot-Plug Events":
If the Port is enabled for edge-triggered interrupt signaling using
MSI or MSI-X, an interrupt message must be sent every time the logical
AND of the following conditions transitions from FALSE to TRUE: [...]
* The Hot-Plug Interrupt Enable bit in the Slot Control register is
set to 1b.
* At least one hot-plug event status bit in the Slot Status register
and its associated enable bit in the Slot Control register are both
set to 1b.
Prevent pciehp from gratuitously bringing down the slot by clearing the
error-induced Data Link Layer State Changed event before restoring
config space. Afterwards, check whether the link has unexpectedly
failed to retrain and synthesize a DLLSC event if so.
Allow each pcie_port_service_driver (one of them being pciehp) to define
a slot_reset callback and re-use the existing pm_iter() function to
iterate over the callbacks.
Thereby, the Endpoint driver remains bound throughout error recovery and
may restore the device to working state.
Surprise removal during error recovery is detected through a Presence
Detect Changed event. The hotplug port is expected to not signal that
event as a result of a Hot Reset.
The issue isn't DPC-specific, it also occurs when an error is handled by
AER through aer_root_reset(). So while the issue was noticed only now,
it's been around since 2006 when AER support was first introduced.
[bhelgaas: drop PCI_ERROR_RECOVERY Kconfig, split pm_iter() rename to
preparatory patch]
Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmai…
Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
Link: https://lore.kernel.org/r/251f4edcc04c14f873ff1c967bc686169cd07d2d.16276381…
Reported-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Tested-by: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: stable(a)vger.kernel.org # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel
Cc: Keith Busch <kbusch(a)kernel.org>
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 69fd401691be..918dccbc74b6 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -189,6 +189,8 @@ int pciehp_get_attention_status(struct hotplug_slot *hotplug_slot, u8 *status);
int pciehp_set_raw_indicator_status(struct hotplug_slot *h_slot, u8 status);
int pciehp_get_raw_indicator_status(struct hotplug_slot *h_slot, u8 *status);
+int pciehp_slot_reset(struct pcie_device *dev);
+
static inline const char *slot_name(struct controller *ctrl)
{
return hotplug_slot_name(&ctrl->hotplug_slot);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index ad3393930ecb..f34114d45259 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -351,6 +351,8 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
.runtime_suspend = pciehp_runtime_suspend,
.runtime_resume = pciehp_runtime_resume,
#endif /* PM */
+
+ .slot_reset = pciehp_slot_reset,
};
int __init pcie_hp_init(void)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3024d7e85e6a..83a0fa119cae 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -862,6 +862,32 @@ void pcie_disable_interrupt(struct controller *ctrl)
pcie_write_cmd(ctrl, 0, mask);
}
+/**
+ * pciehp_slot_reset() - ignore link event caused by error-induced hot reset
+ * @dev: PCI Express port service device
+ *
+ * Called from pcie_portdrv_slot_reset() after AER or DPC initiated a reset
+ * further up in the hierarchy to recover from an error. The reset was
+ * propagated down to this hotplug port. Ignore the resulting link flap.
+ * If the link failed to retrain successfully, synthesize the ignored event.
+ * Surprise removal during reset is detected through Presence Detect Changed.
+ */
+int pciehp_slot_reset(struct pcie_device *dev)
+{
+ struct controller *ctrl = get_service_data(dev);
+
+ if (ctrl->state != ON_STATE)
+ return 0;
+
+ pcie_capability_write_word(dev->port, PCI_EXP_SLTSTA,
+ PCI_EXP_SLTSTA_DLLSC);
+
+ if (!pciehp_check_link_active(ctrl))
+ pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
+
+ return 0;
+}
+
/*
* pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
* bus reset of the bridge, but at the same time we want to ensure that it is
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 6126ee4676a7..41fe1ffd5907 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -85,6 +85,8 @@ struct pcie_port_service_driver {
int (*runtime_suspend)(struct pcie_device *dev);
int (*runtime_resume)(struct pcie_device *dev);
+ int (*slot_reset)(struct pcie_device *dev);
+
/* Device driver may resume normal operations */
void (*error_resume)(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index c7ff1eea225a..1af74c3d9d5d 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -160,6 +160,9 @@ static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
{
+ size_t off = offsetof(struct pcie_port_service_driver, slot_reset);
+ device_for_each_child(&dev->dev, &off, pcie_port_device_iter);
+
pci_restore_state(dev);
pci_save_state(dev);
return PCI_ERS_RESULT_RECOVERED;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7dfbc624eb5726367900c8d86deff50836240361 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 9 Nov 2021 01:30:44 +0000
Subject: [PATCH] KVM: nVMX: Query current VMCS when determining if MSR bitmaps
are in use
Check the current VMCS controls to determine if an MSR write will be
intercepted due to MSR bitmaps being disabled. In the nested VMX case,
KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or
if KVM can't map L1's bitmaps for whatever reason.
Note, the bad behavior is relatively benign in the current code base as
KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and
only if L0 KVM also disables interception of an MSR, and only uses the
buggy helper for MSR_IA32_SPEC_CTRL. Because KVM explicitly tests WRMSR
before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check
will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it
isn't strictly necessary.
Tag the fix for stable in case a future fix wants to use
msr_write_intercepted(), in which case a buggy implementation in older
kernels could prove subtly problematic.
Fixes: d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20211109013047.2041518-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e4fc9ff7cd94..16726418ada9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -769,15 +769,15 @@ void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu)
/*
* Check if MSR is intercepted for currently loaded MSR bitmap.
*/
-static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
+static bool msr_write_intercepted(struct vcpu_vmx *vmx, u32 msr)
{
unsigned long *msr_bitmap;
int f = sizeof(unsigned long);
- if (!cpu_has_vmx_msr_bitmap())
+ if (!(exec_controls_get(vmx) & CPU_BASED_USE_MSR_BITMAPS))
return true;
- msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
+ msr_bitmap = vmx->loaded_vmcs->msr_bitmap;
if (msr <= 0x1fff) {
return !!test_bit(msr, msr_bitmap + 0x800 / f);
@@ -6751,7 +6751,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
* If the L02 MSR bitmap does not intercept the MSR, then we need to
* save it.
*/
- if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
+ if (unlikely(!msr_write_intercepted(vmx, MSR_IA32_SPEC_CTRL)))
vmx->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7dfbc624eb5726367900c8d86deff50836240361 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 9 Nov 2021 01:30:44 +0000
Subject: [PATCH] KVM: nVMX: Query current VMCS when determining if MSR bitmaps
are in use
Check the current VMCS controls to determine if an MSR write will be
intercepted due to MSR bitmaps being disabled. In the nested VMX case,
KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or
if KVM can't map L1's bitmaps for whatever reason.
Note, the bad behavior is relatively benign in the current code base as
KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and
only if L0 KVM also disables interception of an MSR, and only uses the
buggy helper for MSR_IA32_SPEC_CTRL. Because KVM explicitly tests WRMSR
before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check
will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it
isn't strictly necessary.
Tag the fix for stable in case a future fix wants to use
msr_write_intercepted(), in which case a buggy implementation in older
kernels could prove subtly problematic.
Fixes: d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20211109013047.2041518-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e4fc9ff7cd94..16726418ada9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -769,15 +769,15 @@ void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu)
/*
* Check if MSR is intercepted for currently loaded MSR bitmap.
*/
-static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
+static bool msr_write_intercepted(struct vcpu_vmx *vmx, u32 msr)
{
unsigned long *msr_bitmap;
int f = sizeof(unsigned long);
- if (!cpu_has_vmx_msr_bitmap())
+ if (!(exec_controls_get(vmx) & CPU_BASED_USE_MSR_BITMAPS))
return true;
- msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
+ msr_bitmap = vmx->loaded_vmcs->msr_bitmap;
if (msr <= 0x1fff) {
return !!test_bit(msr, msr_bitmap + 0x800 / f);
@@ -6751,7 +6751,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
* If the L02 MSR bitmap does not intercept the MSR, then we need to
* save it.
*/
- if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
+ if (unlikely(!msr_write_intercepted(vmx, MSR_IA32_SPEC_CTRL)))
vmx->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7dfbc624eb5726367900c8d86deff50836240361 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 9 Nov 2021 01:30:44 +0000
Subject: [PATCH] KVM: nVMX: Query current VMCS when determining if MSR bitmaps
are in use
Check the current VMCS controls to determine if an MSR write will be
intercepted due to MSR bitmaps being disabled. In the nested VMX case,
KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or
if KVM can't map L1's bitmaps for whatever reason.
Note, the bad behavior is relatively benign in the current code base as
KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and
only if L0 KVM also disables interception of an MSR, and only uses the
buggy helper for MSR_IA32_SPEC_CTRL. Because KVM explicitly tests WRMSR
before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check
will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it
isn't strictly necessary.
Tag the fix for stable in case a future fix wants to use
msr_write_intercepted(), in which case a buggy implementation in older
kernels could prove subtly problematic.
Fixes: d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20211109013047.2041518-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e4fc9ff7cd94..16726418ada9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -769,15 +769,15 @@ void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu)
/*
* Check if MSR is intercepted for currently loaded MSR bitmap.
*/
-static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
+static bool msr_write_intercepted(struct vcpu_vmx *vmx, u32 msr)
{
unsigned long *msr_bitmap;
int f = sizeof(unsigned long);
- if (!cpu_has_vmx_msr_bitmap())
+ if (!(exec_controls_get(vmx) & CPU_BASED_USE_MSR_BITMAPS))
return true;
- msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
+ msr_bitmap = vmx->loaded_vmcs->msr_bitmap;
if (msr <= 0x1fff) {
return !!test_bit(msr, msr_bitmap + 0x800 / f);
@@ -6751,7 +6751,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
* If the L02 MSR bitmap does not intercept the MSR, then we need to
* save it.
*/
- if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
+ if (unlikely(!msr_write_intercepted(vmx, MSR_IA32_SPEC_CTRL)))
vmx->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 027b57170bf8bb6999a28e4a5f3d78bf1db0f90c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Sat, 2 Oct 2021 15:09:00 +0200
Subject: [PATCH] serial: core: Fix initializing and restoring termios speed
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Since commit edc6afc54968 ("tty: switch to ktermios and new framework")
termios speed is no longer stored only in c_cflag member but also in new
additional c_ispeed and c_ospeed members. If BOTHER flag is set in c_cflag
then termios speed is stored only in these new members.
Therefore to correctly restore termios speed it is required to store also
ispeed and ospeed members, not only cflag member.
In case only cflag member with BOTHER flag is restored then functions
tty_termios_baud_rate() and tty_termios_input_baud_rate() returns baudrate
stored in c_ospeed / c_ispeed member, which is zero as it was not restored
too. If reported baudrate is invalid (e.g. zero) then serial core functions
report fallback baudrate value 9600. So it means that in this case original
baudrate is lost and kernel changes it to value 9600.
Simple reproducer of this issue is to boot kernel with following command
line argument: "console=ttyXXX,86400" (where ttyXXX is the device name).
For speed 86400 there is no Bnnn constant and therefore kernel has to
represent this speed via BOTHER c_cflag. Which means that speed is stored
only in c_ospeed and c_ispeed members, not in c_cflag anymore.
If bootloader correctly configures serial device to speed 86400 then kernel
prints boot log to early console at speed speed 86400 without any issue.
But after kernel starts initializing real console device ttyXXX then speed
is changed to fallback value 9600 because information about speed was lost.
This patch fixes above issue by storing and restoring also ispeed and
ospeed members, which are required for BOTHER flag.
Fixes: edc6afc54968 ("[PATCH] tty: switch to ktermios and new framework")
Cc: stable(a)vger.kernel.org
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Link: https://lore.kernel.org/r/20211002130900.9518-1-pali@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 0e2e35ab64c7..1e738f265eea 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -222,7 +222,11 @@ static int uart_port_startup(struct tty_struct *tty, struct uart_state *state,
if (retval == 0) {
if (uart_console(uport) && uport->cons->cflag) {
tty->termios.c_cflag = uport->cons->cflag;
+ tty->termios.c_ispeed = uport->cons->ispeed;
+ tty->termios.c_ospeed = uport->cons->ospeed;
uport->cons->cflag = 0;
+ uport->cons->ispeed = 0;
+ uport->cons->ospeed = 0;
}
/*
* Initialise the hardware port settings.
@@ -290,8 +294,11 @@ static void uart_shutdown(struct tty_struct *tty, struct uart_state *state)
/*
* Turn off DTR and RTS early.
*/
- if (uport && uart_console(uport) && tty)
+ if (uport && uart_console(uport) && tty) {
uport->cons->cflag = tty->termios.c_cflag;
+ uport->cons->ispeed = tty->termios.c_ispeed;
+ uport->cons->ospeed = tty->termios.c_ospeed;
+ }
if (!tty || C_HUPCL(tty))
uart_port_dtr_rts(uport, 0);
@@ -2094,8 +2101,11 @@ uart_set_options(struct uart_port *port, struct console *co,
* Allow the setting of the UART parameters with a NULL console
* too:
*/
- if (co)
+ if (co) {
co->cflag = termios.c_cflag;
+ co->ispeed = termios.c_ispeed;
+ co->ospeed = termios.c_ospeed;
+ }
return 0;
}
@@ -2229,6 +2239,8 @@ int uart_resume_port(struct uart_driver *drv, struct uart_port *uport)
*/
memset(&termios, 0, sizeof(struct ktermios));
termios.c_cflag = uport->cons->cflag;
+ termios.c_ispeed = uport->cons->ispeed;
+ termios.c_ospeed = uport->cons->ospeed;
/*
* If that's unset, use the tty termios setting.
diff --git a/include/linux/console.h b/include/linux/console.h
index 20874db50bc8..a97f277cfdfa 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -149,6 +149,8 @@ struct console {
short flags;
short index;
int cflag;
+ uint ispeed;
+ uint ospeed;
void *data;
struct console *next;
};
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 164051a6ab5445bd97f719f50b16db8b32174269 Mon Sep 17 00:00:00 2001
From: Zhang Changzhong <zhangchangzhong(a)huawei.com>
Date: Thu, 28 Oct 2021 22:38:27 +0800
Subject: [PATCH] can: j1939: j1939_tp_cmd_recv(): check the dst address of
TP.CM_BAM
The TP.CM_BAM message must be sent to the global address [1], so add a
check to drop TP.CM_BAM sent to a non-global address.
Without this patch, the receiver will treat the following packets as
normal RTS/CTS transport:
18EC0102#20090002FF002301
18EB0102#0100000000000000
18EB0102#020000FFFFFFFFFF
[1] SAE-J1939-82 2015 A.3.3 Row 1.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Link: https://lore.kernel.org/all/1635431907-15617-4-git-send-email-zhangchangzho…
Cc: stable(a)vger.kernel.org
Signed-off-by: Zhang Changzhong <zhangchangzhong(a)huawei.com>
Acked-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
index 05eb3d059e17..a271688780a2 100644
--- a/net/can/j1939/transport.c
+++ b/net/can/j1939/transport.c
@@ -2023,6 +2023,11 @@ static void j1939_tp_cmd_recv(struct j1939_priv *priv, struct sk_buff *skb)
extd = J1939_ETP;
fallthrough;
case J1939_TP_CMD_BAM:
+ if (cmd == J1939_TP_CMD_BAM && !j1939_cb_is_broadcast(skcb)) {
+ netdev_err_once(priv->ndev, "%s: BAM to unicast (%02x), ignoring!\n",
+ __func__, skcb->addr.sa);
+ return;
+ }
fallthrough;
case J1939_TP_CMD_RTS:
if (skcb->addr.type != extd)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 164051a6ab5445bd97f719f50b16db8b32174269 Mon Sep 17 00:00:00 2001
From: Zhang Changzhong <zhangchangzhong(a)huawei.com>
Date: Thu, 28 Oct 2021 22:38:27 +0800
Subject: [PATCH] can: j1939: j1939_tp_cmd_recv(): check the dst address of
TP.CM_BAM
The TP.CM_BAM message must be sent to the global address [1], so add a
check to drop TP.CM_BAM sent to a non-global address.
Without this patch, the receiver will treat the following packets as
normal RTS/CTS transport:
18EC0102#20090002FF002301
18EB0102#0100000000000000
18EB0102#020000FFFFFFFFFF
[1] SAE-J1939-82 2015 A.3.3 Row 1.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Link: https://lore.kernel.org/all/1635431907-15617-4-git-send-email-zhangchangzho…
Cc: stable(a)vger.kernel.org
Signed-off-by: Zhang Changzhong <zhangchangzhong(a)huawei.com>
Acked-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
index 05eb3d059e17..a271688780a2 100644
--- a/net/can/j1939/transport.c
+++ b/net/can/j1939/transport.c
@@ -2023,6 +2023,11 @@ static void j1939_tp_cmd_recv(struct j1939_priv *priv, struct sk_buff *skb)
extd = J1939_ETP;
fallthrough;
case J1939_TP_CMD_BAM:
+ if (cmd == J1939_TP_CMD_BAM && !j1939_cb_is_broadcast(skcb)) {
+ netdev_err_once(priv->ndev, "%s: BAM to unicast (%02x), ignoring!\n",
+ __func__, skcb->addr.sa);
+ return;
+ }
fallthrough;
case J1939_TP_CMD_RTS:
if (skcb->addr.type != extd)
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 164051a6ab5445bd97f719f50b16db8b32174269 Mon Sep 17 00:00:00 2001
From: Zhang Changzhong <zhangchangzhong(a)huawei.com>
Date: Thu, 28 Oct 2021 22:38:27 +0800
Subject: [PATCH] can: j1939: j1939_tp_cmd_recv(): check the dst address of
TP.CM_BAM
The TP.CM_BAM message must be sent to the global address [1], so add a
check to drop TP.CM_BAM sent to a non-global address.
Without this patch, the receiver will treat the following packets as
normal RTS/CTS transport:
18EC0102#20090002FF002301
18EB0102#0100000000000000
18EB0102#020000FFFFFFFFFF
[1] SAE-J1939-82 2015 A.3.3 Row 1.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Link: https://lore.kernel.org/all/1635431907-15617-4-git-send-email-zhangchangzho…
Cc: stable(a)vger.kernel.org
Signed-off-by: Zhang Changzhong <zhangchangzhong(a)huawei.com>
Acked-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
index 05eb3d059e17..a271688780a2 100644
--- a/net/can/j1939/transport.c
+++ b/net/can/j1939/transport.c
@@ -2023,6 +2023,11 @@ static void j1939_tp_cmd_recv(struct j1939_priv *priv, struct sk_buff *skb)
extd = J1939_ETP;
fallthrough;
case J1939_TP_CMD_BAM:
+ if (cmd == J1939_TP_CMD_BAM && !j1939_cb_is_broadcast(skcb)) {
+ netdev_err_once(priv->ndev, "%s: BAM to unicast (%02x), ignoring!\n",
+ __func__, skcb->addr.sa);
+ return;
+ }
fallthrough;
case J1939_TP_CMD_RTS:
if (skcb->addr.type != extd)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7dfbc624eb5726367900c8d86deff50836240361 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 9 Nov 2021 01:30:44 +0000
Subject: [PATCH] KVM: nVMX: Query current VMCS when determining if MSR bitmaps
are in use
Check the current VMCS controls to determine if an MSR write will be
intercepted due to MSR bitmaps being disabled. In the nested VMX case,
KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or
if KVM can't map L1's bitmaps for whatever reason.
Note, the bad behavior is relatively benign in the current code base as
KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and
only if L0 KVM also disables interception of an MSR, and only uses the
buggy helper for MSR_IA32_SPEC_CTRL. Because KVM explicitly tests WRMSR
before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check
will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it
isn't strictly necessary.
Tag the fix for stable in case a future fix wants to use
msr_write_intercepted(), in which case a buggy implementation in older
kernels could prove subtly problematic.
Fixes: d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20211109013047.2041518-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e4fc9ff7cd94..16726418ada9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -769,15 +769,15 @@ void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu)
/*
* Check if MSR is intercepted for currently loaded MSR bitmap.
*/
-static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
+static bool msr_write_intercepted(struct vcpu_vmx *vmx, u32 msr)
{
unsigned long *msr_bitmap;
int f = sizeof(unsigned long);
- if (!cpu_has_vmx_msr_bitmap())
+ if (!(exec_controls_get(vmx) & CPU_BASED_USE_MSR_BITMAPS))
return true;
- msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
+ msr_bitmap = vmx->loaded_vmcs->msr_bitmap;
if (msr <= 0x1fff) {
return !!test_bit(msr, msr_bitmap + 0x800 / f);
@@ -6751,7 +6751,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
* If the L02 MSR bitmap does not intercept the MSR, then we need to
* save it.
*/
- if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
+ if (unlikely(!msr_write_intercepted(vmx, MSR_IA32_SPEC_CTRL)))
vmx->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8b44b174f6aca815fc84c2038e4523ef8e32fabb Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 5 Nov 2021 09:51:00 +0000
Subject: [PATCH] KVM: x86: Add helper to consolidate core logic of
SET_CPUID{2} flows
Move the core logic of SET_CPUID and SET_CPUID2 to a common helper, the
only difference between the two ioctls() is the format of the userspace
struct. A future fix will add yet more code to the core logic.
No functional change intended.
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20211105095101.5384-2-pdurrant(a)amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 2d70edb0f323..41529c168e91 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -239,6 +239,25 @@ u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu)
return rsvd_bits(cpuid_maxphyaddr(vcpu), 63);
}
+static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
+ int nent)
+{
+ int r;
+
+ r = kvm_check_cpuid(e2, nent);
+ if (r)
+ return r;
+
+ kvfree(vcpu->arch.cpuid_entries);
+ vcpu->arch.cpuid_entries = e2;
+ vcpu->arch.cpuid_nent = nent;
+
+ kvm_update_cpuid_runtime(vcpu);
+ kvm_vcpu_after_set_cpuid(vcpu);
+
+ return 0;
+}
+
/* when an old userspace process fills a new kernel module */
int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
struct kvm_cpuid *cpuid,
@@ -275,18 +294,9 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
e2[i].padding[2] = 0;
}
- r = kvm_check_cpuid(e2, cpuid->nent);
- if (r) {
+ r = kvm_set_cpuid(vcpu, e2, cpuid->nent);
+ if (r)
kvfree(e2);
- goto out_free_cpuid;
- }
-
- kvfree(vcpu->arch.cpuid_entries);
- vcpu->arch.cpuid_entries = e2;
- vcpu->arch.cpuid_nent = cpuid->nent;
-
- kvm_update_cpuid_runtime(vcpu);
- kvm_vcpu_after_set_cpuid(vcpu);
out_free_cpuid:
kvfree(e);
@@ -310,20 +320,11 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
return PTR_ERR(e2);
}
- r = kvm_check_cpuid(e2, cpuid->nent);
- if (r) {
+ r = kvm_set_cpuid(vcpu, e2, cpuid->nent);
+ if (r)
kvfree(e2);
- return r;
- }
- kvfree(vcpu->arch.cpuid_entries);
- vcpu->arch.cpuid_entries = e2;
- vcpu->arch.cpuid_nent = cpuid->nent;
-
- kvm_update_cpuid_runtime(vcpu);
- kvm_vcpu_after_set_cpuid(vcpu);
-
- return 0;
+ return r;
}
int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8b44b174f6aca815fc84c2038e4523ef8e32fabb Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 5 Nov 2021 09:51:00 +0000
Subject: [PATCH] KVM: x86: Add helper to consolidate core logic of
SET_CPUID{2} flows
Move the core logic of SET_CPUID and SET_CPUID2 to a common helper, the
only difference between the two ioctls() is the format of the userspace
struct. A future fix will add yet more code to the core logic.
No functional change intended.
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20211105095101.5384-2-pdurrant(a)amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 2d70edb0f323..41529c168e91 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -239,6 +239,25 @@ u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu)
return rsvd_bits(cpuid_maxphyaddr(vcpu), 63);
}
+static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
+ int nent)
+{
+ int r;
+
+ r = kvm_check_cpuid(e2, nent);
+ if (r)
+ return r;
+
+ kvfree(vcpu->arch.cpuid_entries);
+ vcpu->arch.cpuid_entries = e2;
+ vcpu->arch.cpuid_nent = nent;
+
+ kvm_update_cpuid_runtime(vcpu);
+ kvm_vcpu_after_set_cpuid(vcpu);
+
+ return 0;
+}
+
/* when an old userspace process fills a new kernel module */
int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
struct kvm_cpuid *cpuid,
@@ -275,18 +294,9 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
e2[i].padding[2] = 0;
}
- r = kvm_check_cpuid(e2, cpuid->nent);
- if (r) {
+ r = kvm_set_cpuid(vcpu, e2, cpuid->nent);
+ if (r)
kvfree(e2);
- goto out_free_cpuid;
- }
-
- kvfree(vcpu->arch.cpuid_entries);
- vcpu->arch.cpuid_entries = e2;
- vcpu->arch.cpuid_nent = cpuid->nent;
-
- kvm_update_cpuid_runtime(vcpu);
- kvm_vcpu_after_set_cpuid(vcpu);
out_free_cpuid:
kvfree(e);
@@ -310,20 +320,11 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
return PTR_ERR(e2);
}
- r = kvm_check_cpuid(e2, cpuid->nent);
- if (r) {
+ r = kvm_set_cpuid(vcpu, e2, cpuid->nent);
+ if (r)
kvfree(e2);
- return r;
- }
- kvfree(vcpu->arch.cpuid_entries);
- vcpu->arch.cpuid_entries = e2;
- vcpu->arch.cpuid_nent = cpuid->nent;
-
- kvm_update_cpuid_runtime(vcpu);
- kvm_vcpu_after_set_cpuid(vcpu);
-
- return 0;
+ return r;
}
int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8b44b174f6aca815fc84c2038e4523ef8e32fabb Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 5 Nov 2021 09:51:00 +0000
Subject: [PATCH] KVM: x86: Add helper to consolidate core logic of
SET_CPUID{2} flows
Move the core logic of SET_CPUID and SET_CPUID2 to a common helper, the
only difference between the two ioctls() is the format of the userspace
struct. A future fix will add yet more code to the core logic.
No functional change intended.
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20211105095101.5384-2-pdurrant(a)amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 2d70edb0f323..41529c168e91 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -239,6 +239,25 @@ u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu)
return rsvd_bits(cpuid_maxphyaddr(vcpu), 63);
}
+static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
+ int nent)
+{
+ int r;
+
+ r = kvm_check_cpuid(e2, nent);
+ if (r)
+ return r;
+
+ kvfree(vcpu->arch.cpuid_entries);
+ vcpu->arch.cpuid_entries = e2;
+ vcpu->arch.cpuid_nent = nent;
+
+ kvm_update_cpuid_runtime(vcpu);
+ kvm_vcpu_after_set_cpuid(vcpu);
+
+ return 0;
+}
+
/* when an old userspace process fills a new kernel module */
int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
struct kvm_cpuid *cpuid,
@@ -275,18 +294,9 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
e2[i].padding[2] = 0;
}
- r = kvm_check_cpuid(e2, cpuid->nent);
- if (r) {
+ r = kvm_set_cpuid(vcpu, e2, cpuid->nent);
+ if (r)
kvfree(e2);
- goto out_free_cpuid;
- }
-
- kvfree(vcpu->arch.cpuid_entries);
- vcpu->arch.cpuid_entries = e2;
- vcpu->arch.cpuid_nent = cpuid->nent;
-
- kvm_update_cpuid_runtime(vcpu);
- kvm_vcpu_after_set_cpuid(vcpu);
out_free_cpuid:
kvfree(e);
@@ -310,20 +320,11 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
return PTR_ERR(e2);
}
- r = kvm_check_cpuid(e2, cpuid->nent);
- if (r) {
+ r = kvm_set_cpuid(vcpu, e2, cpuid->nent);
+ if (r)
kvfree(e2);
- return r;
- }
- kvfree(vcpu->arch.cpuid_entries);
- vcpu->arch.cpuid_entries = e2;
- vcpu->arch.cpuid_nent = cpuid->nent;
-
- kvm_update_cpuid_runtime(vcpu);
- kvm_vcpu_after_set_cpuid(vcpu);
-
- return 0;
+ return r;
}
int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7e2175ebd695f17860c5bd4ad7616cce12ed4591 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw2(a)infradead.org>
Date: Tue, 2 Nov 2021 17:36:39 +0000
Subject: [PATCH] KVM: x86: Fix recording of guest steal time / preempted
status
In commit b043138246a4 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is
not missed") we switched to using a gfn_to_pfn_cache for accessing the
guest steal time structure in order to allow for an atomic xchg of the
preempted field. This has a couple of problems.
Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the
atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a
guest vCPU using an IOMEM page for its steal time would never have its
preempted field set.
Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it
should have been. There are two stages to the GFN->PFN conversion;
first the GFN is converted to a userspace HVA, and then that HVA is
looked up in the process page tables to find the underlying host PFN.
Correct invalidation of the latter would require being hooked up to the
MMU notifiers, but that doesn't happen---so it just keeps mapping and
unmapping the *wrong* PFN after the userspace page tables change.
In the !IOMEM case at least the stale page *is* pinned all the time it's
cached, so it won't be freed and reused by anyone else while still
receiving the steal time updates. The map/unmap dance only takes care
of the KVM administrivia such as marking the page dirty.
Until the gfn_to_pfn cache handles the remapping automatically by
integrating with the MMU notifiers, we might as well not get a
kernel mapping of it, and use the perfectly serviceable userspace HVA
that we already have. We just need to implement the atomic xchg on
the userspace address with appropriate exception handling, which is
fairly trivial.
Cc: stable(a)vger.kernel.org
Fixes: b043138246a4 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed")
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel(a)infradead.org>
[I didn't entirely agree with David's assessment of the
usefulness of the gfn_to_pfn cache, and integrated the outcome
of the discussion in the above commit message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 88fce6ab4bbd..32345241e620 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -748,7 +748,7 @@ struct kvm_vcpu_arch {
u8 preempted;
u64 msr_val;
u64 last_steal;
- struct gfn_to_pfn_cache cache;
+ struct gfn_to_hva_cache cache;
} st;
u64 l1_tsc_offset;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ac83d873d65b..e7d2ef944cc8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3260,8 +3260,11 @@ static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
static void record_steal_time(struct kvm_vcpu *vcpu)
{
- struct kvm_host_map map;
- struct kvm_steal_time *st;
+ struct gfn_to_hva_cache *ghc = &vcpu->arch.st.cache;
+ struct kvm_steal_time __user *st;
+ struct kvm_memslots *slots;
+ u64 steal;
+ u32 version;
if (kvm_xen_msr_enabled(vcpu->kvm)) {
kvm_xen_runstate_set_running(vcpu);
@@ -3271,47 +3274,83 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
- /* -EAGAIN is returned in atomic context so we can just return. */
- if (kvm_map_gfn(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT,
- &map, &vcpu->arch.st.cache, false))
+ if (WARN_ON_ONCE(current->mm != vcpu->kvm->mm))
return;
- st = map.hva +
- offset_in_page(vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS);
+ slots = kvm_memslots(vcpu->kvm);
+
+ if (unlikely(slots->generation != ghc->generation ||
+ kvm_is_error_hva(ghc->hva) || !ghc->memslot)) {
+ gfn_t gfn = vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS;
+
+ /* We rely on the fact that it fits in a single page. */
+ BUILD_BUG_ON((sizeof(*st) - 1) & KVM_STEAL_VALID_BITS);
+
+ if (kvm_gfn_to_hva_cache_init(vcpu->kvm, ghc, gfn, sizeof(*st)) ||
+ kvm_is_error_hva(ghc->hva) || !ghc->memslot)
+ return;
+ }
+
+ st = (struct kvm_steal_time __user *)ghc->hva;
+ if (!user_access_begin(st, sizeof(*st)))
+ return;
/*
* Doing a TLB flush here, on the guest's behalf, can avoid
* expensive IPIs.
*/
if (guest_pv_has(vcpu, KVM_FEATURE_PV_TLB_FLUSH)) {
- u8 st_preempted = xchg(&st->preempted, 0);
+ u8 st_preempted = 0;
+ int err = -EFAULT;
+
+ asm volatile("1: xchgb %0, %2\n"
+ "xor %1, %1\n"
+ "2:\n"
+ _ASM_EXTABLE_UA(1b, 2b)
+ : "+r" (st_preempted),
+ "+&r" (err)
+ : "m" (st->preempted));
+ if (err)
+ goto out;
+
+ user_access_end();
+
+ vcpu->arch.st.preempted = 0;
trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
st_preempted & KVM_VCPU_FLUSH_TLB);
if (st_preempted & KVM_VCPU_FLUSH_TLB)
kvm_vcpu_flush_tlb_guest(vcpu);
+
+ if (!user_access_begin(st, sizeof(*st)))
+ goto dirty;
} else {
- st->preempted = 0;
+ unsafe_put_user(0, &st->preempted, out);
+ vcpu->arch.st.preempted = 0;
}
- vcpu->arch.st.preempted = 0;
-
- if (st->version & 1)
- st->version += 1; /* first time write, random junk */
+ unsafe_get_user(version, &st->version, out);
+ if (version & 1)
+ version += 1; /* first time write, random junk */
- st->version += 1;
+ version += 1;
+ unsafe_put_user(version, &st->version, out);
smp_wmb();
- st->steal += current->sched_info.run_delay -
+ unsafe_get_user(steal, &st->steal, out);
+ steal += current->sched_info.run_delay -
vcpu->arch.st.last_steal;
vcpu->arch.st.last_steal = current->sched_info.run_delay;
+ unsafe_put_user(steal, &st->steal, out);
- smp_wmb();
-
- st->version += 1;
+ version += 1;
+ unsafe_put_user(version, &st->version, out);
- kvm_unmap_gfn(vcpu, &map, &vcpu->arch.st.cache, true, false);
+ out:
+ user_access_end();
+ dirty:
+ mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
}
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
@@ -4351,8 +4390,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
{
- struct kvm_host_map map;
- struct kvm_steal_time *st;
+ struct gfn_to_hva_cache *ghc = &vcpu->arch.st.cache;
+ struct kvm_steal_time __user *st;
+ struct kvm_memslots *slots;
+ static const u8 preempted = KVM_VCPU_PREEMPTED;
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
@@ -4360,16 +4401,23 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
if (vcpu->arch.st.preempted)
return;
- if (kvm_map_gfn(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT, &map,
- &vcpu->arch.st.cache, true))
+ /* This happens on process exit */
+ if (unlikely(current->mm != vcpu->kvm->mm))
return;
- st = map.hva +
- offset_in_page(vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS);
+ slots = kvm_memslots(vcpu->kvm);
+
+ if (unlikely(slots->generation != ghc->generation ||
+ kvm_is_error_hva(ghc->hva) || !ghc->memslot))
+ return;
- st->preempted = vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;
+ st = (struct kvm_steal_time __user *)ghc->hva;
+ BUILD_BUG_ON(sizeof(st->preempted) != sizeof(preempted));
- kvm_unmap_gfn(vcpu, &map, &vcpu->arch.st.cache, true, true);
+ if (!copy_to_user_nofault(&st->preempted, &preempted, sizeof(preempted)))
+ vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;
+
+ mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
}
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -11022,11 +11070,8 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
- struct gfn_to_pfn_cache *cache = &vcpu->arch.st.cache;
int idx;
- kvm_release_pfn(cache->pfn, cache->dirty, cache);
-
kvmclock_reset(vcpu);
static_call(kvm_x86_vcpu_free)(vcpu);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7e2175ebd695f17860c5bd4ad7616cce12ed4591 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw2(a)infradead.org>
Date: Tue, 2 Nov 2021 17:36:39 +0000
Subject: [PATCH] KVM: x86: Fix recording of guest steal time / preempted
status
In commit b043138246a4 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is
not missed") we switched to using a gfn_to_pfn_cache for accessing the
guest steal time structure in order to allow for an atomic xchg of the
preempted field. This has a couple of problems.
Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the
atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a
guest vCPU using an IOMEM page for its steal time would never have its
preempted field set.
Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it
should have been. There are two stages to the GFN->PFN conversion;
first the GFN is converted to a userspace HVA, and then that HVA is
looked up in the process page tables to find the underlying host PFN.
Correct invalidation of the latter would require being hooked up to the
MMU notifiers, but that doesn't happen---so it just keeps mapping and
unmapping the *wrong* PFN after the userspace page tables change.
In the !IOMEM case at least the stale page *is* pinned all the time it's
cached, so it won't be freed and reused by anyone else while still
receiving the steal time updates. The map/unmap dance only takes care
of the KVM administrivia such as marking the page dirty.
Until the gfn_to_pfn cache handles the remapping automatically by
integrating with the MMU notifiers, we might as well not get a
kernel mapping of it, and use the perfectly serviceable userspace HVA
that we already have. We just need to implement the atomic xchg on
the userspace address with appropriate exception handling, which is
fairly trivial.
Cc: stable(a)vger.kernel.org
Fixes: b043138246a4 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed")
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel(a)infradead.org>
[I didn't entirely agree with David's assessment of the
usefulness of the gfn_to_pfn cache, and integrated the outcome
of the discussion in the above commit message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 88fce6ab4bbd..32345241e620 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -748,7 +748,7 @@ struct kvm_vcpu_arch {
u8 preempted;
u64 msr_val;
u64 last_steal;
- struct gfn_to_pfn_cache cache;
+ struct gfn_to_hva_cache cache;
} st;
u64 l1_tsc_offset;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ac83d873d65b..e7d2ef944cc8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3260,8 +3260,11 @@ static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
static void record_steal_time(struct kvm_vcpu *vcpu)
{
- struct kvm_host_map map;
- struct kvm_steal_time *st;
+ struct gfn_to_hva_cache *ghc = &vcpu->arch.st.cache;
+ struct kvm_steal_time __user *st;
+ struct kvm_memslots *slots;
+ u64 steal;
+ u32 version;
if (kvm_xen_msr_enabled(vcpu->kvm)) {
kvm_xen_runstate_set_running(vcpu);
@@ -3271,47 +3274,83 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
- /* -EAGAIN is returned in atomic context so we can just return. */
- if (kvm_map_gfn(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT,
- &map, &vcpu->arch.st.cache, false))
+ if (WARN_ON_ONCE(current->mm != vcpu->kvm->mm))
return;
- st = map.hva +
- offset_in_page(vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS);
+ slots = kvm_memslots(vcpu->kvm);
+
+ if (unlikely(slots->generation != ghc->generation ||
+ kvm_is_error_hva(ghc->hva) || !ghc->memslot)) {
+ gfn_t gfn = vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS;
+
+ /* We rely on the fact that it fits in a single page. */
+ BUILD_BUG_ON((sizeof(*st) - 1) & KVM_STEAL_VALID_BITS);
+
+ if (kvm_gfn_to_hva_cache_init(vcpu->kvm, ghc, gfn, sizeof(*st)) ||
+ kvm_is_error_hva(ghc->hva) || !ghc->memslot)
+ return;
+ }
+
+ st = (struct kvm_steal_time __user *)ghc->hva;
+ if (!user_access_begin(st, sizeof(*st)))
+ return;
/*
* Doing a TLB flush here, on the guest's behalf, can avoid
* expensive IPIs.
*/
if (guest_pv_has(vcpu, KVM_FEATURE_PV_TLB_FLUSH)) {
- u8 st_preempted = xchg(&st->preempted, 0);
+ u8 st_preempted = 0;
+ int err = -EFAULT;
+
+ asm volatile("1: xchgb %0, %2\n"
+ "xor %1, %1\n"
+ "2:\n"
+ _ASM_EXTABLE_UA(1b, 2b)
+ : "+r" (st_preempted),
+ "+&r" (err)
+ : "m" (st->preempted));
+ if (err)
+ goto out;
+
+ user_access_end();
+
+ vcpu->arch.st.preempted = 0;
trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
st_preempted & KVM_VCPU_FLUSH_TLB);
if (st_preempted & KVM_VCPU_FLUSH_TLB)
kvm_vcpu_flush_tlb_guest(vcpu);
+
+ if (!user_access_begin(st, sizeof(*st)))
+ goto dirty;
} else {
- st->preempted = 0;
+ unsafe_put_user(0, &st->preempted, out);
+ vcpu->arch.st.preempted = 0;
}
- vcpu->arch.st.preempted = 0;
-
- if (st->version & 1)
- st->version += 1; /* first time write, random junk */
+ unsafe_get_user(version, &st->version, out);
+ if (version & 1)
+ version += 1; /* first time write, random junk */
- st->version += 1;
+ version += 1;
+ unsafe_put_user(version, &st->version, out);
smp_wmb();
- st->steal += current->sched_info.run_delay -
+ unsafe_get_user(steal, &st->steal, out);
+ steal += current->sched_info.run_delay -
vcpu->arch.st.last_steal;
vcpu->arch.st.last_steal = current->sched_info.run_delay;
+ unsafe_put_user(steal, &st->steal, out);
- smp_wmb();
-
- st->version += 1;
+ version += 1;
+ unsafe_put_user(version, &st->version, out);
- kvm_unmap_gfn(vcpu, &map, &vcpu->arch.st.cache, true, false);
+ out:
+ user_access_end();
+ dirty:
+ mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
}
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
@@ -4351,8 +4390,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
{
- struct kvm_host_map map;
- struct kvm_steal_time *st;
+ struct gfn_to_hva_cache *ghc = &vcpu->arch.st.cache;
+ struct kvm_steal_time __user *st;
+ struct kvm_memslots *slots;
+ static const u8 preempted = KVM_VCPU_PREEMPTED;
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
@@ -4360,16 +4401,23 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
if (vcpu->arch.st.preempted)
return;
- if (kvm_map_gfn(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT, &map,
- &vcpu->arch.st.cache, true))
+ /* This happens on process exit */
+ if (unlikely(current->mm != vcpu->kvm->mm))
return;
- st = map.hva +
- offset_in_page(vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS);
+ slots = kvm_memslots(vcpu->kvm);
+
+ if (unlikely(slots->generation != ghc->generation ||
+ kvm_is_error_hva(ghc->hva) || !ghc->memslot))
+ return;
- st->preempted = vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;
+ st = (struct kvm_steal_time __user *)ghc->hva;
+ BUILD_BUG_ON(sizeof(st->preempted) != sizeof(preempted));
- kvm_unmap_gfn(vcpu, &map, &vcpu->arch.st.cache, true, true);
+ if (!copy_to_user_nofault(&st->preempted, &preempted, sizeof(preempted)))
+ vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;
+
+ mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
}
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -11022,11 +11070,8 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
- struct gfn_to_pfn_cache *cache = &vcpu->arch.st.cache;
int idx;
- kvm_release_pfn(cache->pfn, cache->dirty, cache);
-
kvmclock_reset(vcpu);
static_call(kvm_x86_vcpu_free)(vcpu);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7e2175ebd695f17860c5bd4ad7616cce12ed4591 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw2(a)infradead.org>
Date: Tue, 2 Nov 2021 17:36:39 +0000
Subject: [PATCH] KVM: x86: Fix recording of guest steal time / preempted
status
In commit b043138246a4 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is
not missed") we switched to using a gfn_to_pfn_cache for accessing the
guest steal time structure in order to allow for an atomic xchg of the
preempted field. This has a couple of problems.
Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the
atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a
guest vCPU using an IOMEM page for its steal time would never have its
preempted field set.
Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it
should have been. There are two stages to the GFN->PFN conversion;
first the GFN is converted to a userspace HVA, and then that HVA is
looked up in the process page tables to find the underlying host PFN.
Correct invalidation of the latter would require being hooked up to the
MMU notifiers, but that doesn't happen---so it just keeps mapping and
unmapping the *wrong* PFN after the userspace page tables change.
In the !IOMEM case at least the stale page *is* pinned all the time it's
cached, so it won't be freed and reused by anyone else while still
receiving the steal time updates. The map/unmap dance only takes care
of the KVM administrivia such as marking the page dirty.
Until the gfn_to_pfn cache handles the remapping automatically by
integrating with the MMU notifiers, we might as well not get a
kernel mapping of it, and use the perfectly serviceable userspace HVA
that we already have. We just need to implement the atomic xchg on
the userspace address with appropriate exception handling, which is
fairly trivial.
Cc: stable(a)vger.kernel.org
Fixes: b043138246a4 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed")
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel(a)infradead.org>
[I didn't entirely agree with David's assessment of the
usefulness of the gfn_to_pfn cache, and integrated the outcome
of the discussion in the above commit message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 88fce6ab4bbd..32345241e620 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -748,7 +748,7 @@ struct kvm_vcpu_arch {
u8 preempted;
u64 msr_val;
u64 last_steal;
- struct gfn_to_pfn_cache cache;
+ struct gfn_to_hva_cache cache;
} st;
u64 l1_tsc_offset;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ac83d873d65b..e7d2ef944cc8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3260,8 +3260,11 @@ static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
static void record_steal_time(struct kvm_vcpu *vcpu)
{
- struct kvm_host_map map;
- struct kvm_steal_time *st;
+ struct gfn_to_hva_cache *ghc = &vcpu->arch.st.cache;
+ struct kvm_steal_time __user *st;
+ struct kvm_memslots *slots;
+ u64 steal;
+ u32 version;
if (kvm_xen_msr_enabled(vcpu->kvm)) {
kvm_xen_runstate_set_running(vcpu);
@@ -3271,47 +3274,83 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
- /* -EAGAIN is returned in atomic context so we can just return. */
- if (kvm_map_gfn(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT,
- &map, &vcpu->arch.st.cache, false))
+ if (WARN_ON_ONCE(current->mm != vcpu->kvm->mm))
return;
- st = map.hva +
- offset_in_page(vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS);
+ slots = kvm_memslots(vcpu->kvm);
+
+ if (unlikely(slots->generation != ghc->generation ||
+ kvm_is_error_hva(ghc->hva) || !ghc->memslot)) {
+ gfn_t gfn = vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS;
+
+ /* We rely on the fact that it fits in a single page. */
+ BUILD_BUG_ON((sizeof(*st) - 1) & KVM_STEAL_VALID_BITS);
+
+ if (kvm_gfn_to_hva_cache_init(vcpu->kvm, ghc, gfn, sizeof(*st)) ||
+ kvm_is_error_hva(ghc->hva) || !ghc->memslot)
+ return;
+ }
+
+ st = (struct kvm_steal_time __user *)ghc->hva;
+ if (!user_access_begin(st, sizeof(*st)))
+ return;
/*
* Doing a TLB flush here, on the guest's behalf, can avoid
* expensive IPIs.
*/
if (guest_pv_has(vcpu, KVM_FEATURE_PV_TLB_FLUSH)) {
- u8 st_preempted = xchg(&st->preempted, 0);
+ u8 st_preempted = 0;
+ int err = -EFAULT;
+
+ asm volatile("1: xchgb %0, %2\n"
+ "xor %1, %1\n"
+ "2:\n"
+ _ASM_EXTABLE_UA(1b, 2b)
+ : "+r" (st_preempted),
+ "+&r" (err)
+ : "m" (st->preempted));
+ if (err)
+ goto out;
+
+ user_access_end();
+
+ vcpu->arch.st.preempted = 0;
trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
st_preempted & KVM_VCPU_FLUSH_TLB);
if (st_preempted & KVM_VCPU_FLUSH_TLB)
kvm_vcpu_flush_tlb_guest(vcpu);
+
+ if (!user_access_begin(st, sizeof(*st)))
+ goto dirty;
} else {
- st->preempted = 0;
+ unsafe_put_user(0, &st->preempted, out);
+ vcpu->arch.st.preempted = 0;
}
- vcpu->arch.st.preempted = 0;
-
- if (st->version & 1)
- st->version += 1; /* first time write, random junk */
+ unsafe_get_user(version, &st->version, out);
+ if (version & 1)
+ version += 1; /* first time write, random junk */
- st->version += 1;
+ version += 1;
+ unsafe_put_user(version, &st->version, out);
smp_wmb();
- st->steal += current->sched_info.run_delay -
+ unsafe_get_user(steal, &st->steal, out);
+ steal += current->sched_info.run_delay -
vcpu->arch.st.last_steal;
vcpu->arch.st.last_steal = current->sched_info.run_delay;
+ unsafe_put_user(steal, &st->steal, out);
- smp_wmb();
-
- st->version += 1;
+ version += 1;
+ unsafe_put_user(version, &st->version, out);
- kvm_unmap_gfn(vcpu, &map, &vcpu->arch.st.cache, true, false);
+ out:
+ user_access_end();
+ dirty:
+ mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
}
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
@@ -4351,8 +4390,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
{
- struct kvm_host_map map;
- struct kvm_steal_time *st;
+ struct gfn_to_hva_cache *ghc = &vcpu->arch.st.cache;
+ struct kvm_steal_time __user *st;
+ struct kvm_memslots *slots;
+ static const u8 preempted = KVM_VCPU_PREEMPTED;
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
@@ -4360,16 +4401,23 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
if (vcpu->arch.st.preempted)
return;
- if (kvm_map_gfn(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT, &map,
- &vcpu->arch.st.cache, true))
+ /* This happens on process exit */
+ if (unlikely(current->mm != vcpu->kvm->mm))
return;
- st = map.hva +
- offset_in_page(vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS);
+ slots = kvm_memslots(vcpu->kvm);
+
+ if (unlikely(slots->generation != ghc->generation ||
+ kvm_is_error_hva(ghc->hva) || !ghc->memslot))
+ return;
- st->preempted = vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;
+ st = (struct kvm_steal_time __user *)ghc->hva;
+ BUILD_BUG_ON(sizeof(st->preempted) != sizeof(preempted));
- kvm_unmap_gfn(vcpu, &map, &vcpu->arch.st.cache, true, true);
+ if (!copy_to_user_nofault(&st->preempted, &preempted, sizeof(preempted)))
+ vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;
+
+ mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
}
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -11022,11 +11070,8 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
{
- struct gfn_to_pfn_cache *cache = &vcpu->arch.st.cache;
int idx;
- kvm_release_pfn(cache->pfn, cache->dirty, cache);
-
kvmclock_reset(vcpu);
static_call(kvm_x86_vcpu_free)(vcpu);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fe6f45f6ba22d625a8500cbad0237c60dd3117ee Mon Sep 17 00:00:00 2001
From: Yang Yingliang <yangyingliang(a)huawei.com>
Date: Tue, 12 Oct 2021 14:36:24 +0800
Subject: [PATCH] iio: core: check return value when calling dev_set_name()
I got a null-ptr-deref report when doing fault injection test:
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:strlen+0x0/0x20
Call Trace:
start_creating+0x199/0x2f0
debugfs_create_dir+0x25/0x430
__iio_device_register+0x4da/0x1b40 [industrialio]
__devm_iio_device_register+0x22/0x80 [industrialio]
max1027_probe+0x639/0x860 [max1027]
spi_probe+0x183/0x210
really_probe+0x285/0xc30
If dev_set_name() fails, the dev_name() is null, check the return
value of dev_set_name() to avoid the null-ptr-deref.
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Fixes: e553f182d55b ("staging: iio: core: Introduce debugfs support...")
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20211012063624.3167460-1-yangyingliang@huawei.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 2dbb37e09b8c..48fda6a79076 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -1664,7 +1664,13 @@ struct iio_dev *iio_device_alloc(struct device *parent, int sizeof_priv)
kfree(iio_dev_opaque);
return NULL;
}
- dev_set_name(&indio_dev->dev, "iio:device%d", iio_dev_opaque->id);
+
+ if (dev_set_name(&indio_dev->dev, "iio:device%d", iio_dev_opaque->id)) {
+ ida_simple_remove(&iio_ida, iio_dev_opaque->id);
+ kfree(iio_dev_opaque);
+ return NULL;
+ }
+
INIT_LIST_HEAD(&iio_dev_opaque->buffer_list);
INIT_LIST_HEAD(&iio_dev_opaque->ioctl_handlers);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fe6f45f6ba22d625a8500cbad0237c60dd3117ee Mon Sep 17 00:00:00 2001
From: Yang Yingliang <yangyingliang(a)huawei.com>
Date: Tue, 12 Oct 2021 14:36:24 +0800
Subject: [PATCH] iio: core: check return value when calling dev_set_name()
I got a null-ptr-deref report when doing fault injection test:
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:strlen+0x0/0x20
Call Trace:
start_creating+0x199/0x2f0
debugfs_create_dir+0x25/0x430
__iio_device_register+0x4da/0x1b40 [industrialio]
__devm_iio_device_register+0x22/0x80 [industrialio]
max1027_probe+0x639/0x860 [max1027]
spi_probe+0x183/0x210
really_probe+0x285/0xc30
If dev_set_name() fails, the dev_name() is null, check the return
value of dev_set_name() to avoid the null-ptr-deref.
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Fixes: e553f182d55b ("staging: iio: core: Introduce debugfs support...")
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20211012063624.3167460-1-yangyingliang@huawei.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 2dbb37e09b8c..48fda6a79076 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -1664,7 +1664,13 @@ struct iio_dev *iio_device_alloc(struct device *parent, int sizeof_priv)
kfree(iio_dev_opaque);
return NULL;
}
- dev_set_name(&indio_dev->dev, "iio:device%d", iio_dev_opaque->id);
+
+ if (dev_set_name(&indio_dev->dev, "iio:device%d", iio_dev_opaque->id)) {
+ ida_simple_remove(&iio_ida, iio_dev_opaque->id);
+ kfree(iio_dev_opaque);
+ return NULL;
+ }
+
INIT_LIST_HEAD(&iio_dev_opaque->buffer_list);
INIT_LIST_HEAD(&iio_dev_opaque->ioctl_handlers);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fe6f45f6ba22d625a8500cbad0237c60dd3117ee Mon Sep 17 00:00:00 2001
From: Yang Yingliang <yangyingliang(a)huawei.com>
Date: Tue, 12 Oct 2021 14:36:24 +0800
Subject: [PATCH] iio: core: check return value when calling dev_set_name()
I got a null-ptr-deref report when doing fault injection test:
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:strlen+0x0/0x20
Call Trace:
start_creating+0x199/0x2f0
debugfs_create_dir+0x25/0x430
__iio_device_register+0x4da/0x1b40 [industrialio]
__devm_iio_device_register+0x22/0x80 [industrialio]
max1027_probe+0x639/0x860 [max1027]
spi_probe+0x183/0x210
really_probe+0x285/0xc30
If dev_set_name() fails, the dev_name() is null, check the return
value of dev_set_name() to avoid the null-ptr-deref.
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Fixes: e553f182d55b ("staging: iio: core: Introduce debugfs support...")
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20211012063624.3167460-1-yangyingliang@huawei.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 2dbb37e09b8c..48fda6a79076 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -1664,7 +1664,13 @@ struct iio_dev *iio_device_alloc(struct device *parent, int sizeof_priv)
kfree(iio_dev_opaque);
return NULL;
}
- dev_set_name(&indio_dev->dev, "iio:device%d", iio_dev_opaque->id);
+
+ if (dev_set_name(&indio_dev->dev, "iio:device%d", iio_dev_opaque->id)) {
+ ida_simple_remove(&iio_ida, iio_dev_opaque->id);
+ kfree(iio_dev_opaque);
+ return NULL;
+ }
+
INIT_LIST_HEAD(&iio_dev_opaque->buffer_list);
INIT_LIST_HEAD(&iio_dev_opaque->ioctl_handlers);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fe6f45f6ba22d625a8500cbad0237c60dd3117ee Mon Sep 17 00:00:00 2001
From: Yang Yingliang <yangyingliang(a)huawei.com>
Date: Tue, 12 Oct 2021 14:36:24 +0800
Subject: [PATCH] iio: core: check return value when calling dev_set_name()
I got a null-ptr-deref report when doing fault injection test:
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:strlen+0x0/0x20
Call Trace:
start_creating+0x199/0x2f0
debugfs_create_dir+0x25/0x430
__iio_device_register+0x4da/0x1b40 [industrialio]
__devm_iio_device_register+0x22/0x80 [industrialio]
max1027_probe+0x639/0x860 [max1027]
spi_probe+0x183/0x210
really_probe+0x285/0xc30
If dev_set_name() fails, the dev_name() is null, check the return
value of dev_set_name() to avoid the null-ptr-deref.
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Fixes: e553f182d55b ("staging: iio: core: Introduce debugfs support...")
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20211012063624.3167460-1-yangyingliang@huawei.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 2dbb37e09b8c..48fda6a79076 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -1664,7 +1664,13 @@ struct iio_dev *iio_device_alloc(struct device *parent, int sizeof_priv)
kfree(iio_dev_opaque);
return NULL;
}
- dev_set_name(&indio_dev->dev, "iio:device%d", iio_dev_opaque->id);
+
+ if (dev_set_name(&indio_dev->dev, "iio:device%d", iio_dev_opaque->id)) {
+ ida_simple_remove(&iio_ida, iio_dev_opaque->id);
+ kfree(iio_dev_opaque);
+ return NULL;
+ }
+
INIT_LIST_HEAD(&iio_dev_opaque->buffer_list);
INIT_LIST_HEAD(&iio_dev_opaque->ioctl_handlers);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fe6f45f6ba22d625a8500cbad0237c60dd3117ee Mon Sep 17 00:00:00 2001
From: Yang Yingliang <yangyingliang(a)huawei.com>
Date: Tue, 12 Oct 2021 14:36:24 +0800
Subject: [PATCH] iio: core: check return value when calling dev_set_name()
I got a null-ptr-deref report when doing fault injection test:
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:strlen+0x0/0x20
Call Trace:
start_creating+0x199/0x2f0
debugfs_create_dir+0x25/0x430
__iio_device_register+0x4da/0x1b40 [industrialio]
__devm_iio_device_register+0x22/0x80 [industrialio]
max1027_probe+0x639/0x860 [max1027]
spi_probe+0x183/0x210
really_probe+0x285/0xc30
If dev_set_name() fails, the dev_name() is null, check the return
value of dev_set_name() to avoid the null-ptr-deref.
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Fixes: e553f182d55b ("staging: iio: core: Introduce debugfs support...")
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20211012063624.3167460-1-yangyingliang@huawei.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 2dbb37e09b8c..48fda6a79076 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -1664,7 +1664,13 @@ struct iio_dev *iio_device_alloc(struct device *parent, int sizeof_priv)
kfree(iio_dev_opaque);
return NULL;
}
- dev_set_name(&indio_dev->dev, "iio:device%d", iio_dev_opaque->id);
+
+ if (dev_set_name(&indio_dev->dev, "iio:device%d", iio_dev_opaque->id)) {
+ ida_simple_remove(&iio_ida, iio_dev_opaque->id);
+ kfree(iio_dev_opaque);
+ return NULL;
+ }
+
INIT_LIST_HEAD(&iio_dev_opaque->buffer_list);
INIT_LIST_HEAD(&iio_dev_opaque->ioctl_handlers);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fe6f45f6ba22d625a8500cbad0237c60dd3117ee Mon Sep 17 00:00:00 2001
From: Yang Yingliang <yangyingliang(a)huawei.com>
Date: Tue, 12 Oct 2021 14:36:24 +0800
Subject: [PATCH] iio: core: check return value when calling dev_set_name()
I got a null-ptr-deref report when doing fault injection test:
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: 0010:strlen+0x0/0x20
Call Trace:
start_creating+0x199/0x2f0
debugfs_create_dir+0x25/0x430
__iio_device_register+0x4da/0x1b40 [industrialio]
__devm_iio_device_register+0x22/0x80 [industrialio]
max1027_probe+0x639/0x860 [max1027]
spi_probe+0x183/0x210
really_probe+0x285/0xc30
If dev_set_name() fails, the dev_name() is null, check the return
value of dev_set_name() to avoid the null-ptr-deref.
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Fixes: e553f182d55b ("staging: iio: core: Introduce debugfs support...")
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20211012063624.3167460-1-yangyingliang@huawei.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 2dbb37e09b8c..48fda6a79076 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -1664,7 +1664,13 @@ struct iio_dev *iio_device_alloc(struct device *parent, int sizeof_priv)
kfree(iio_dev_opaque);
return NULL;
}
- dev_set_name(&indio_dev->dev, "iio:device%d", iio_dev_opaque->id);
+
+ if (dev_set_name(&indio_dev->dev, "iio:device%d", iio_dev_opaque->id)) {
+ ida_simple_remove(&iio_ida, iio_dev_opaque->id);
+ kfree(iio_dev_opaque);
+ return NULL;
+ }
+
INIT_LIST_HEAD(&iio_dev_opaque->buffer_list);
INIT_LIST_HEAD(&iio_dev_opaque->ioctl_handlers);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e775eb9fc2a4107f03222fa48bc95c2c82427e64 Mon Sep 17 00:00:00 2001
From: Meng Li <Meng.Li(a)windriver.com>
Date: Tue, 19 Oct 2021 10:32:41 +0800
Subject: [PATCH] soc: fsl: dpio: replace smp_processor_id with
raw_smp_processor_id
When enable debug kernel configs,there will be calltrace as below:
BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
caller is debug_smp_processor_id+0x20/0x30
CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.10.63-yocto-standard #1
Hardware name: NXP Layerscape LX2160ARDB (DT)
Call trace:
dump_backtrace+0x0/0x1a0
show_stack+0x24/0x30
dump_stack+0xf0/0x13c
check_preemption_disabled+0x100/0x110
debug_smp_processor_id+0x20/0x30
dpaa2_io_query_fq_count+0xdc/0x154
dpaa2_eth_stop+0x144/0x314
__dev_close_many+0xdc/0x160
__dev_change_flags+0xe8/0x220
dev_change_flags+0x30/0x70
ic_close_devs+0x50/0x78
ip_auto_config+0xed0/0xf10
do_one_initcall+0xac/0x460
kernel_init_freeable+0x30c/0x378
kernel_init+0x20/0x128
ret_from_fork+0x10/0x38
Based on comment in the context, it doesn't matter whether
preemption is disable or not. So, replace smp_processor_id()
with raw_smp_processor_id() to avoid above call trace.
Fixes: c89105c9b390 ("staging: fsl-mc: Move DPIO from staging to drivers/soc/fsl")
Cc: stable(a)vger.kernel.org
Signed-off-by: Meng Li <Meng.Li(a)windriver.com>
Signed-off-by: Li Yang <leoyang.li(a)nxp.com>
diff --git a/drivers/soc/fsl/dpio/dpio-service.c b/drivers/soc/fsl/dpio/dpio-service.c
index 7351f3030550..779c319a4b82 100644
--- a/drivers/soc/fsl/dpio/dpio-service.c
+++ b/drivers/soc/fsl/dpio/dpio-service.c
@@ -59,7 +59,7 @@ static inline struct dpaa2_io *service_select_by_cpu(struct dpaa2_io *d,
* potentially being migrated away.
*/
if (cpu < 0)
- cpu = smp_processor_id();
+ cpu = raw_smp_processor_id();
/* If a specific cpu was requested, pick it up immediately */
return dpio_by_cpu[cpu];
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e775eb9fc2a4107f03222fa48bc95c2c82427e64 Mon Sep 17 00:00:00 2001
From: Meng Li <Meng.Li(a)windriver.com>
Date: Tue, 19 Oct 2021 10:32:41 +0800
Subject: [PATCH] soc: fsl: dpio: replace smp_processor_id with
raw_smp_processor_id
When enable debug kernel configs,there will be calltrace as below:
BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
caller is debug_smp_processor_id+0x20/0x30
CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.10.63-yocto-standard #1
Hardware name: NXP Layerscape LX2160ARDB (DT)
Call trace:
dump_backtrace+0x0/0x1a0
show_stack+0x24/0x30
dump_stack+0xf0/0x13c
check_preemption_disabled+0x100/0x110
debug_smp_processor_id+0x20/0x30
dpaa2_io_query_fq_count+0xdc/0x154
dpaa2_eth_stop+0x144/0x314
__dev_close_many+0xdc/0x160
__dev_change_flags+0xe8/0x220
dev_change_flags+0x30/0x70
ic_close_devs+0x50/0x78
ip_auto_config+0xed0/0xf10
do_one_initcall+0xac/0x460
kernel_init_freeable+0x30c/0x378
kernel_init+0x20/0x128
ret_from_fork+0x10/0x38
Based on comment in the context, it doesn't matter whether
preemption is disable or not. So, replace smp_processor_id()
with raw_smp_processor_id() to avoid above call trace.
Fixes: c89105c9b390 ("staging: fsl-mc: Move DPIO from staging to drivers/soc/fsl")
Cc: stable(a)vger.kernel.org
Signed-off-by: Meng Li <Meng.Li(a)windriver.com>
Signed-off-by: Li Yang <leoyang.li(a)nxp.com>
diff --git a/drivers/soc/fsl/dpio/dpio-service.c b/drivers/soc/fsl/dpio/dpio-service.c
index 7351f3030550..779c319a4b82 100644
--- a/drivers/soc/fsl/dpio/dpio-service.c
+++ b/drivers/soc/fsl/dpio/dpio-service.c
@@ -59,7 +59,7 @@ static inline struct dpaa2_io *service_select_by_cpu(struct dpaa2_io *d,
* potentially being migrated away.
*/
if (cpu < 0)
- cpu = smp_processor_id();
+ cpu = raw_smp_processor_id();
/* If a specific cpu was requested, pick it up immediately */
return dpio_by_cpu[cpu];
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1869023e24c0de73a160a424dac4621cefd628ae Mon Sep 17 00:00:00 2001
From: Andrew Gabbasov <andrew_gabbasov(a)mentor.com>
Date: Wed, 22 Sep 2021 13:48:30 -0500
Subject: [PATCH] memory: renesas-rpc-if: Avoid unaligned bus access for
HyperFlash
HyperFlash devices in Renesas SoCs use 2-bytes addressing, according
to HW manual paragraph 62.3.3 (which officially describes Serial Flash
access, but seems to be applicable to HyperFlash too). And 1-byte bus
read operations to 2-bytes unaligned addresses in external address space
read mode work incorrectly (returns the other byte from the same word).
Function memcpy_fromio(), used by the driver to read data from the bus,
in ARM64 architecture (to which Renesas cores belong) uses 8-bytes
bus accesses for appropriate aligned addresses, and 1-bytes accesses
for other addresses. This results in incorrect data read from HyperFlash
in unaligned cases.
This issue can be reproduced using something like the following commands
(where mtd1 is a parition on Hyperflash storage, defined properly
in a device tree):
[Correct fragment, read from Hyperflash]
root@rcar-gen3:~# dd if=/dev/mtd1 of=/tmp/zz bs=32 count=1
root@rcar-gen3:~# hexdump -C /tmp/zz
00000000 f4 03 00 aa f5 03 01 aa f6 03 02 aa f7 03 03 aa |................|
00000010 00 00 80 d2 40 20 18 d5 00 06 81 d2 a0 18 a6 f2 |....@ ..........|
00000020
[Incorrect read of the same fragment: see the difference at offsets 8-11]
root@rcar-gen3:~# dd if=/dev/mtd1 of=/tmp/zz bs=12 count=1
root@rcar-gen3:~# hexdump -C /tmp/zz
00000000 f4 03 00 aa f5 03 01 aa 03 03 aa aa |............|
0000000c
Fix this issue by creating a local replacement of the copying function,
that performs only properly aligned bus accesses, and is used for reading
from HyperFlash.
Fixes: ca7d8b980b67f ("memory: add Renesas RPC-IF driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Gabbasov <andrew_gabbasov(a)mentor.com>
Link: https://lore.kernel.org/r/20210922184830.29147-1-andrew_gabbasov@mentor.com
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
diff --git a/drivers/memory/renesas-rpc-if.c b/drivers/memory/renesas-rpc-if.c
index 77a011d5ff8c..7435baad0007 100644
--- a/drivers/memory/renesas-rpc-if.c
+++ b/drivers/memory/renesas-rpc-if.c
@@ -185,7 +185,6 @@ static int rpcif_reg_read(void *context, unsigned int reg, unsigned int *val)
*val = readl(rpc->base + reg);
return 0;
-
}
static int rpcif_reg_write(void *context, unsigned int reg, unsigned int val)
@@ -545,6 +544,48 @@ err_out:
}
EXPORT_SYMBOL(rpcif_manual_xfer);
+static void memcpy_fromio_readw(void *to,
+ const void __iomem *from,
+ size_t count)
+{
+ const int maxw = (IS_ENABLED(CONFIG_64BIT)) ? 8 : 4;
+ u8 buf[2];
+
+ if (count && ((unsigned long)from & 1)) {
+ *(u16 *)buf = __raw_readw((void __iomem *)((unsigned long)from & ~1));
+ *(u8 *)to = buf[1];
+ from++;
+ to++;
+ count--;
+ }
+ while (count >= 2 && !IS_ALIGNED((unsigned long)from, maxw)) {
+ *(u16 *)to = __raw_readw(from);
+ from += 2;
+ to += 2;
+ count -= 2;
+ }
+ while (count >= maxw) {
+#ifdef CONFIG_64BIT
+ *(u64 *)to = __raw_readq(from);
+#else
+ *(u32 *)to = __raw_readl(from);
+#endif
+ from += maxw;
+ to += maxw;
+ count -= maxw;
+ }
+ while (count >= 2) {
+ *(u16 *)to = __raw_readw(from);
+ from += 2;
+ to += 2;
+ count -= 2;
+ }
+ if (count) {
+ *(u16 *)buf = __raw_readw(from);
+ *(u8 *)to = buf[0];
+ }
+}
+
ssize_t rpcif_dirmap_read(struct rpcif *rpc, u64 offs, size_t len, void *buf)
{
loff_t from = offs & (RPCIF_DIRMAP_SIZE - 1);
@@ -566,7 +607,10 @@ ssize_t rpcif_dirmap_read(struct rpcif *rpc, u64 offs, size_t len, void *buf)
regmap_write(rpc->regmap, RPCIF_DRDMCR, rpc->dummy);
regmap_write(rpc->regmap, RPCIF_DRDRENR, rpc->ddr);
- memcpy_fromio(buf, rpc->dirmap + from, len);
+ if (rpc->bus_size == 2)
+ memcpy_fromio_readw(buf, rpc->dirmap + from, len);
+ else
+ memcpy_fromio(buf, rpc->dirmap + from, len);
pm_runtime_put(rpc->dev);
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1869023e24c0de73a160a424dac4621cefd628ae Mon Sep 17 00:00:00 2001
From: Andrew Gabbasov <andrew_gabbasov(a)mentor.com>
Date: Wed, 22 Sep 2021 13:48:30 -0500
Subject: [PATCH] memory: renesas-rpc-if: Avoid unaligned bus access for
HyperFlash
HyperFlash devices in Renesas SoCs use 2-bytes addressing, according
to HW manual paragraph 62.3.3 (which officially describes Serial Flash
access, but seems to be applicable to HyperFlash too). And 1-byte bus
read operations to 2-bytes unaligned addresses in external address space
read mode work incorrectly (returns the other byte from the same word).
Function memcpy_fromio(), used by the driver to read data from the bus,
in ARM64 architecture (to which Renesas cores belong) uses 8-bytes
bus accesses for appropriate aligned addresses, and 1-bytes accesses
for other addresses. This results in incorrect data read from HyperFlash
in unaligned cases.
This issue can be reproduced using something like the following commands
(where mtd1 is a parition on Hyperflash storage, defined properly
in a device tree):
[Correct fragment, read from Hyperflash]
root@rcar-gen3:~# dd if=/dev/mtd1 of=/tmp/zz bs=32 count=1
root@rcar-gen3:~# hexdump -C /tmp/zz
00000000 f4 03 00 aa f5 03 01 aa f6 03 02 aa f7 03 03 aa |................|
00000010 00 00 80 d2 40 20 18 d5 00 06 81 d2 a0 18 a6 f2 |....@ ..........|
00000020
[Incorrect read of the same fragment: see the difference at offsets 8-11]
root@rcar-gen3:~# dd if=/dev/mtd1 of=/tmp/zz bs=12 count=1
root@rcar-gen3:~# hexdump -C /tmp/zz
00000000 f4 03 00 aa f5 03 01 aa 03 03 aa aa |............|
0000000c
Fix this issue by creating a local replacement of the copying function,
that performs only properly aligned bus accesses, and is used for reading
from HyperFlash.
Fixes: ca7d8b980b67f ("memory: add Renesas RPC-IF driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Gabbasov <andrew_gabbasov(a)mentor.com>
Link: https://lore.kernel.org/r/20210922184830.29147-1-andrew_gabbasov@mentor.com
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
diff --git a/drivers/memory/renesas-rpc-if.c b/drivers/memory/renesas-rpc-if.c
index 77a011d5ff8c..7435baad0007 100644
--- a/drivers/memory/renesas-rpc-if.c
+++ b/drivers/memory/renesas-rpc-if.c
@@ -185,7 +185,6 @@ static int rpcif_reg_read(void *context, unsigned int reg, unsigned int *val)
*val = readl(rpc->base + reg);
return 0;
-
}
static int rpcif_reg_write(void *context, unsigned int reg, unsigned int val)
@@ -545,6 +544,48 @@ err_out:
}
EXPORT_SYMBOL(rpcif_manual_xfer);
+static void memcpy_fromio_readw(void *to,
+ const void __iomem *from,
+ size_t count)
+{
+ const int maxw = (IS_ENABLED(CONFIG_64BIT)) ? 8 : 4;
+ u8 buf[2];
+
+ if (count && ((unsigned long)from & 1)) {
+ *(u16 *)buf = __raw_readw((void __iomem *)((unsigned long)from & ~1));
+ *(u8 *)to = buf[1];
+ from++;
+ to++;
+ count--;
+ }
+ while (count >= 2 && !IS_ALIGNED((unsigned long)from, maxw)) {
+ *(u16 *)to = __raw_readw(from);
+ from += 2;
+ to += 2;
+ count -= 2;
+ }
+ while (count >= maxw) {
+#ifdef CONFIG_64BIT
+ *(u64 *)to = __raw_readq(from);
+#else
+ *(u32 *)to = __raw_readl(from);
+#endif
+ from += maxw;
+ to += maxw;
+ count -= maxw;
+ }
+ while (count >= 2) {
+ *(u16 *)to = __raw_readw(from);
+ from += 2;
+ to += 2;
+ count -= 2;
+ }
+ if (count) {
+ *(u16 *)buf = __raw_readw(from);
+ *(u8 *)to = buf[0];
+ }
+}
+
ssize_t rpcif_dirmap_read(struct rpcif *rpc, u64 offs, size_t len, void *buf)
{
loff_t from = offs & (RPCIF_DIRMAP_SIZE - 1);
@@ -566,7 +607,10 @@ ssize_t rpcif_dirmap_read(struct rpcif *rpc, u64 offs, size_t len, void *buf)
regmap_write(rpc->regmap, RPCIF_DRDMCR, rpc->dummy);
regmap_write(rpc->regmap, RPCIF_DRDRENR, rpc->ddr);
- memcpy_fromio(buf, rpc->dirmap + from, len);
+ if (rpc->bus_size == 2)
+ memcpy_fromio_readw(buf, rpc->dirmap + from, len);
+ else
+ memcpy_fromio(buf, rpc->dirmap + from, len);
pm_runtime_put(rpc->dev);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1869023e24c0de73a160a424dac4621cefd628ae Mon Sep 17 00:00:00 2001
From: Andrew Gabbasov <andrew_gabbasov(a)mentor.com>
Date: Wed, 22 Sep 2021 13:48:30 -0500
Subject: [PATCH] memory: renesas-rpc-if: Avoid unaligned bus access for
HyperFlash
HyperFlash devices in Renesas SoCs use 2-bytes addressing, according
to HW manual paragraph 62.3.3 (which officially describes Serial Flash
access, but seems to be applicable to HyperFlash too). And 1-byte bus
read operations to 2-bytes unaligned addresses in external address space
read mode work incorrectly (returns the other byte from the same word).
Function memcpy_fromio(), used by the driver to read data from the bus,
in ARM64 architecture (to which Renesas cores belong) uses 8-bytes
bus accesses for appropriate aligned addresses, and 1-bytes accesses
for other addresses. This results in incorrect data read from HyperFlash
in unaligned cases.
This issue can be reproduced using something like the following commands
(where mtd1 is a parition on Hyperflash storage, defined properly
in a device tree):
[Correct fragment, read from Hyperflash]
root@rcar-gen3:~# dd if=/dev/mtd1 of=/tmp/zz bs=32 count=1
root@rcar-gen3:~# hexdump -C /tmp/zz
00000000 f4 03 00 aa f5 03 01 aa f6 03 02 aa f7 03 03 aa |................|
00000010 00 00 80 d2 40 20 18 d5 00 06 81 d2 a0 18 a6 f2 |....@ ..........|
00000020
[Incorrect read of the same fragment: see the difference at offsets 8-11]
root@rcar-gen3:~# dd if=/dev/mtd1 of=/tmp/zz bs=12 count=1
root@rcar-gen3:~# hexdump -C /tmp/zz
00000000 f4 03 00 aa f5 03 01 aa 03 03 aa aa |............|
0000000c
Fix this issue by creating a local replacement of the copying function,
that performs only properly aligned bus accesses, and is used for reading
from HyperFlash.
Fixes: ca7d8b980b67f ("memory: add Renesas RPC-IF driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Gabbasov <andrew_gabbasov(a)mentor.com>
Link: https://lore.kernel.org/r/20210922184830.29147-1-andrew_gabbasov@mentor.com
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
diff --git a/drivers/memory/renesas-rpc-if.c b/drivers/memory/renesas-rpc-if.c
index 77a011d5ff8c..7435baad0007 100644
--- a/drivers/memory/renesas-rpc-if.c
+++ b/drivers/memory/renesas-rpc-if.c
@@ -185,7 +185,6 @@ static int rpcif_reg_read(void *context, unsigned int reg, unsigned int *val)
*val = readl(rpc->base + reg);
return 0;
-
}
static int rpcif_reg_write(void *context, unsigned int reg, unsigned int val)
@@ -545,6 +544,48 @@ err_out:
}
EXPORT_SYMBOL(rpcif_manual_xfer);
+static void memcpy_fromio_readw(void *to,
+ const void __iomem *from,
+ size_t count)
+{
+ const int maxw = (IS_ENABLED(CONFIG_64BIT)) ? 8 : 4;
+ u8 buf[2];
+
+ if (count && ((unsigned long)from & 1)) {
+ *(u16 *)buf = __raw_readw((void __iomem *)((unsigned long)from & ~1));
+ *(u8 *)to = buf[1];
+ from++;
+ to++;
+ count--;
+ }
+ while (count >= 2 && !IS_ALIGNED((unsigned long)from, maxw)) {
+ *(u16 *)to = __raw_readw(from);
+ from += 2;
+ to += 2;
+ count -= 2;
+ }
+ while (count >= maxw) {
+#ifdef CONFIG_64BIT
+ *(u64 *)to = __raw_readq(from);
+#else
+ *(u32 *)to = __raw_readl(from);
+#endif
+ from += maxw;
+ to += maxw;
+ count -= maxw;
+ }
+ while (count >= 2) {
+ *(u16 *)to = __raw_readw(from);
+ from += 2;
+ to += 2;
+ count -= 2;
+ }
+ if (count) {
+ *(u16 *)buf = __raw_readw(from);
+ *(u8 *)to = buf[0];
+ }
+}
+
ssize_t rpcif_dirmap_read(struct rpcif *rpc, u64 offs, size_t len, void *buf)
{
loff_t from = offs & (RPCIF_DIRMAP_SIZE - 1);
@@ -566,7 +607,10 @@ ssize_t rpcif_dirmap_read(struct rpcif *rpc, u64 offs, size_t len, void *buf)
regmap_write(rpc->regmap, RPCIF_DRDMCR, rpc->dummy);
regmap_write(rpc->regmap, RPCIF_DRDRENR, rpc->ddr);
- memcpy_fromio(buf, rpc->dirmap + from, len);
+ if (rpc->bus_size == 2)
+ memcpy_fromio_readw(buf, rpc->dirmap + from, len);
+ else
+ memcpy_fromio(buf, rpc->dirmap + from, len);
pm_runtime_put(rpc->dev);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6b51b02a3a0ac49dfe302818d0746a799545e4e9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig(a)amd.com>
Date: Tue, 15 Jun 2021 13:12:33 +0200
Subject: [PATCH] dma-buf: fix and rework dma_buf_poll v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Daniel pointed me towards this function and there are multiple obvious problems
in the implementation.
First of all the retry loop is not working as intended. In general the retry
makes only sense if you grab the reference first and then check the sequence
values.
Then we should always also wait for the exclusive fence.
It's also good practice to keep the reference around when installing callbacks
to fences you don't own.
And last the whole implementation was unnecessary complex and rather hard to
understand which could lead to probably unexpected behavior of the IOCTL.
Fix all this by reworking the implementation from scratch. Dropping the
whole RCU approach and taking the lock instead.
Only mildly tested and needs a thoughtful review of the code.
Pushing through drm-misc-next to avoid merge conflicts and give the code
another round of testing.
v2: fix the reference counting as well
v3: keep the excl fence handling as is for stable
v4: back to testing all fences, drop RCU
v5: handle in and out separately
v6: add missing clear of events
v7: change coding style as suggested by Michel, drop unused variables
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Tested-by: Michel Dänzer <mdaenzer(a)redhat.com>
CC: stable(a)vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20210720131110.88512-1-christ…
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 474de2d988ca..61e20ae7b08b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -74,7 +74,7 @@ static void dma_buf_release(struct dentry *dentry)
* If you hit this BUG() it means someone dropped their ref to the
* dma-buf while still having pending operation to the buffer.
*/
- BUG_ON(dmabuf->cb_shared.active || dmabuf->cb_excl.active);
+ BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
dma_buf_stats_teardown(dmabuf);
dmabuf->ops->release(dmabuf);
@@ -206,16 +206,55 @@ static void dma_buf_poll_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
wake_up_locked_poll(dcb->poll, dcb->active);
dcb->active = 0;
spin_unlock_irqrestore(&dcb->poll->lock, flags);
+ dma_fence_put(fence);
+}
+
+static bool dma_buf_poll_shared(struct dma_resv *resv,
+ struct dma_buf_poll_cb_t *dcb)
+{
+ struct dma_resv_list *fobj = dma_resv_shared_list(resv);
+ struct dma_fence *fence;
+ int i, r;
+
+ if (!fobj)
+ return false;
+
+ for (i = 0; i < fobj->shared_count; ++i) {
+ fence = rcu_dereference_protected(fobj->shared[i],
+ dma_resv_held(resv));
+ dma_fence_get(fence);
+ r = dma_fence_add_callback(fence, &dcb->cb, dma_buf_poll_cb);
+ if (!r)
+ return true;
+ dma_fence_put(fence);
+ }
+
+ return false;
+}
+
+static bool dma_buf_poll_excl(struct dma_resv *resv,
+ struct dma_buf_poll_cb_t *dcb)
+{
+ struct dma_fence *fence = dma_resv_excl_fence(resv);
+ int r;
+
+ if (!fence)
+ return false;
+
+ dma_fence_get(fence);
+ r = dma_fence_add_callback(fence, &dcb->cb, dma_buf_poll_cb);
+ if (!r)
+ return true;
+ dma_fence_put(fence);
+
+ return false;
}
static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
{
struct dma_buf *dmabuf;
struct dma_resv *resv;
- struct dma_resv_list *fobj;
- struct dma_fence *fence_excl;
__poll_t events;
- unsigned shared_count, seq;
dmabuf = file->private_data;
if (!dmabuf || !dmabuf->resv)
@@ -229,101 +268,50 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
if (!events)
return 0;
-retry:
- seq = read_seqcount_begin(&resv->seq);
- rcu_read_lock();
-
- fobj = rcu_dereference(resv->fence);
- if (fobj)
- shared_count = fobj->shared_count;
- else
- shared_count = 0;
- fence_excl = dma_resv_excl_fence(resv);
- if (read_seqcount_retry(&resv->seq, seq)) {
- rcu_read_unlock();
- goto retry;
- }
-
- if (fence_excl && (!(events & EPOLLOUT) || shared_count == 0)) {
- struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
- __poll_t pevents = EPOLLIN;
+ dma_resv_lock(resv, NULL);
- if (shared_count == 0)
- pevents |= EPOLLOUT;
+ if (events & EPOLLOUT) {
+ struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_out;
+ /* Check that callback isn't busy */
spin_lock_irq(&dmabuf->poll.lock);
- if (dcb->active) {
- dcb->active |= pevents;
- events &= ~pevents;
- } else
- dcb->active = pevents;
+ if (dcb->active)
+ events &= ~EPOLLOUT;
+ else
+ dcb->active = EPOLLOUT;
spin_unlock_irq(&dmabuf->poll.lock);
- if (events & pevents) {
- if (!dma_fence_get_rcu(fence_excl)) {
- /* force a recheck */
- events &= ~pevents;
- dma_buf_poll_cb(NULL, &dcb->cb);
- } else if (!dma_fence_add_callback(fence_excl, &dcb->cb,
- dma_buf_poll_cb)) {
- events &= ~pevents;
- dma_fence_put(fence_excl);
- } else {
- /*
- * No callback queued, wake up any additional
- * waiters.
- */
- dma_fence_put(fence_excl);
+ if (events & EPOLLOUT) {
+ if (!dma_buf_poll_shared(resv, dcb) &&
+ !dma_buf_poll_excl(resv, dcb))
+ /* No callback queued, wake up any other waiters */
dma_buf_poll_cb(NULL, &dcb->cb);
- }
+ else
+ events &= ~EPOLLOUT;
}
}
- if ((events & EPOLLOUT) && shared_count > 0) {
- struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_shared;
- int i;
+ if (events & EPOLLIN) {
+ struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_in;
- /* Only queue a new callback if no event has fired yet */
+ /* Check that callback isn't busy */
spin_lock_irq(&dmabuf->poll.lock);
if (dcb->active)
- events &= ~EPOLLOUT;
+ events &= ~EPOLLIN;
else
- dcb->active = EPOLLOUT;
+ dcb->active = EPOLLIN;
spin_unlock_irq(&dmabuf->poll.lock);
- if (!(events & EPOLLOUT))
- goto out;
-
- for (i = 0; i < shared_count; ++i) {
- struct dma_fence *fence = rcu_dereference(fobj->shared[i]);
-
- if (!dma_fence_get_rcu(fence)) {
- /*
- * fence refcount dropped to zero, this means
- * that fobj has been freed
- *
- * call dma_buf_poll_cb and force a recheck!
- */
- events &= ~EPOLLOUT;
+ if (events & EPOLLIN) {
+ if (!dma_buf_poll_excl(resv, dcb))
+ /* No callback queued, wake up any other waiters */
dma_buf_poll_cb(NULL, &dcb->cb);
- break;
- }
- if (!dma_fence_add_callback(fence, &dcb->cb,
- dma_buf_poll_cb)) {
- dma_fence_put(fence);
- events &= ~EPOLLOUT;
- break;
- }
- dma_fence_put(fence);
+ else
+ events &= ~EPOLLIN;
}
-
- /* No callback queued, wake up any additional waiters. */
- if (i == shared_count)
- dma_buf_poll_cb(NULL, &dcb->cb);
}
-out:
- rcu_read_unlock();
+ dma_resv_unlock(resv);
return events;
}
@@ -566,8 +554,8 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
dmabuf->owner = exp_info->owner;
spin_lock_init(&dmabuf->name_lock);
init_waitqueue_head(&dmabuf->poll);
- dmabuf->cb_excl.poll = dmabuf->cb_shared.poll = &dmabuf->poll;
- dmabuf->cb_excl.active = dmabuf->cb_shared.active = 0;
+ dmabuf->cb_in.poll = dmabuf->cb_out.poll = &dmabuf->poll;
+ dmabuf->cb_in.active = dmabuf->cb_out.active = 0;
if (!resv) {
resv = (struct dma_resv *)&dmabuf[1];
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 66470c37e471..02c2eb874da6 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -440,7 +440,7 @@ struct dma_buf {
wait_queue_head_t *poll;
__poll_t active;
- } cb_excl, cb_shared;
+ } cb_in, cb_out;
#ifdef CONFIG_DMABUF_SYSFS_STATS
/**
* @sysfs_entry:
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6b51b02a3a0ac49dfe302818d0746a799545e4e9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig(a)amd.com>
Date: Tue, 15 Jun 2021 13:12:33 +0200
Subject: [PATCH] dma-buf: fix and rework dma_buf_poll v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Daniel pointed me towards this function and there are multiple obvious problems
in the implementation.
First of all the retry loop is not working as intended. In general the retry
makes only sense if you grab the reference first and then check the sequence
values.
Then we should always also wait for the exclusive fence.
It's also good practice to keep the reference around when installing callbacks
to fences you don't own.
And last the whole implementation was unnecessary complex and rather hard to
understand which could lead to probably unexpected behavior of the IOCTL.
Fix all this by reworking the implementation from scratch. Dropping the
whole RCU approach and taking the lock instead.
Only mildly tested and needs a thoughtful review of the code.
Pushing through drm-misc-next to avoid merge conflicts and give the code
another round of testing.
v2: fix the reference counting as well
v3: keep the excl fence handling as is for stable
v4: back to testing all fences, drop RCU
v5: handle in and out separately
v6: add missing clear of events
v7: change coding style as suggested by Michel, drop unused variables
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Tested-by: Michel Dänzer <mdaenzer(a)redhat.com>
CC: stable(a)vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20210720131110.88512-1-christ…
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 474de2d988ca..61e20ae7b08b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -74,7 +74,7 @@ static void dma_buf_release(struct dentry *dentry)
* If you hit this BUG() it means someone dropped their ref to the
* dma-buf while still having pending operation to the buffer.
*/
- BUG_ON(dmabuf->cb_shared.active || dmabuf->cb_excl.active);
+ BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
dma_buf_stats_teardown(dmabuf);
dmabuf->ops->release(dmabuf);
@@ -206,16 +206,55 @@ static void dma_buf_poll_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
wake_up_locked_poll(dcb->poll, dcb->active);
dcb->active = 0;
spin_unlock_irqrestore(&dcb->poll->lock, flags);
+ dma_fence_put(fence);
+}
+
+static bool dma_buf_poll_shared(struct dma_resv *resv,
+ struct dma_buf_poll_cb_t *dcb)
+{
+ struct dma_resv_list *fobj = dma_resv_shared_list(resv);
+ struct dma_fence *fence;
+ int i, r;
+
+ if (!fobj)
+ return false;
+
+ for (i = 0; i < fobj->shared_count; ++i) {
+ fence = rcu_dereference_protected(fobj->shared[i],
+ dma_resv_held(resv));
+ dma_fence_get(fence);
+ r = dma_fence_add_callback(fence, &dcb->cb, dma_buf_poll_cb);
+ if (!r)
+ return true;
+ dma_fence_put(fence);
+ }
+
+ return false;
+}
+
+static bool dma_buf_poll_excl(struct dma_resv *resv,
+ struct dma_buf_poll_cb_t *dcb)
+{
+ struct dma_fence *fence = dma_resv_excl_fence(resv);
+ int r;
+
+ if (!fence)
+ return false;
+
+ dma_fence_get(fence);
+ r = dma_fence_add_callback(fence, &dcb->cb, dma_buf_poll_cb);
+ if (!r)
+ return true;
+ dma_fence_put(fence);
+
+ return false;
}
static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
{
struct dma_buf *dmabuf;
struct dma_resv *resv;
- struct dma_resv_list *fobj;
- struct dma_fence *fence_excl;
__poll_t events;
- unsigned shared_count, seq;
dmabuf = file->private_data;
if (!dmabuf || !dmabuf->resv)
@@ -229,101 +268,50 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
if (!events)
return 0;
-retry:
- seq = read_seqcount_begin(&resv->seq);
- rcu_read_lock();
-
- fobj = rcu_dereference(resv->fence);
- if (fobj)
- shared_count = fobj->shared_count;
- else
- shared_count = 0;
- fence_excl = dma_resv_excl_fence(resv);
- if (read_seqcount_retry(&resv->seq, seq)) {
- rcu_read_unlock();
- goto retry;
- }
-
- if (fence_excl && (!(events & EPOLLOUT) || shared_count == 0)) {
- struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
- __poll_t pevents = EPOLLIN;
+ dma_resv_lock(resv, NULL);
- if (shared_count == 0)
- pevents |= EPOLLOUT;
+ if (events & EPOLLOUT) {
+ struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_out;
+ /* Check that callback isn't busy */
spin_lock_irq(&dmabuf->poll.lock);
- if (dcb->active) {
- dcb->active |= pevents;
- events &= ~pevents;
- } else
- dcb->active = pevents;
+ if (dcb->active)
+ events &= ~EPOLLOUT;
+ else
+ dcb->active = EPOLLOUT;
spin_unlock_irq(&dmabuf->poll.lock);
- if (events & pevents) {
- if (!dma_fence_get_rcu(fence_excl)) {
- /* force a recheck */
- events &= ~pevents;
- dma_buf_poll_cb(NULL, &dcb->cb);
- } else if (!dma_fence_add_callback(fence_excl, &dcb->cb,
- dma_buf_poll_cb)) {
- events &= ~pevents;
- dma_fence_put(fence_excl);
- } else {
- /*
- * No callback queued, wake up any additional
- * waiters.
- */
- dma_fence_put(fence_excl);
+ if (events & EPOLLOUT) {
+ if (!dma_buf_poll_shared(resv, dcb) &&
+ !dma_buf_poll_excl(resv, dcb))
+ /* No callback queued, wake up any other waiters */
dma_buf_poll_cb(NULL, &dcb->cb);
- }
+ else
+ events &= ~EPOLLOUT;
}
}
- if ((events & EPOLLOUT) && shared_count > 0) {
- struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_shared;
- int i;
+ if (events & EPOLLIN) {
+ struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_in;
- /* Only queue a new callback if no event has fired yet */
+ /* Check that callback isn't busy */
spin_lock_irq(&dmabuf->poll.lock);
if (dcb->active)
- events &= ~EPOLLOUT;
+ events &= ~EPOLLIN;
else
- dcb->active = EPOLLOUT;
+ dcb->active = EPOLLIN;
spin_unlock_irq(&dmabuf->poll.lock);
- if (!(events & EPOLLOUT))
- goto out;
-
- for (i = 0; i < shared_count; ++i) {
- struct dma_fence *fence = rcu_dereference(fobj->shared[i]);
-
- if (!dma_fence_get_rcu(fence)) {
- /*
- * fence refcount dropped to zero, this means
- * that fobj has been freed
- *
- * call dma_buf_poll_cb and force a recheck!
- */
- events &= ~EPOLLOUT;
+ if (events & EPOLLIN) {
+ if (!dma_buf_poll_excl(resv, dcb))
+ /* No callback queued, wake up any other waiters */
dma_buf_poll_cb(NULL, &dcb->cb);
- break;
- }
- if (!dma_fence_add_callback(fence, &dcb->cb,
- dma_buf_poll_cb)) {
- dma_fence_put(fence);
- events &= ~EPOLLOUT;
- break;
- }
- dma_fence_put(fence);
+ else
+ events &= ~EPOLLIN;
}
-
- /* No callback queued, wake up any additional waiters. */
- if (i == shared_count)
- dma_buf_poll_cb(NULL, &dcb->cb);
}
-out:
- rcu_read_unlock();
+ dma_resv_unlock(resv);
return events;
}
@@ -566,8 +554,8 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
dmabuf->owner = exp_info->owner;
spin_lock_init(&dmabuf->name_lock);
init_waitqueue_head(&dmabuf->poll);
- dmabuf->cb_excl.poll = dmabuf->cb_shared.poll = &dmabuf->poll;
- dmabuf->cb_excl.active = dmabuf->cb_shared.active = 0;
+ dmabuf->cb_in.poll = dmabuf->cb_out.poll = &dmabuf->poll;
+ dmabuf->cb_in.active = dmabuf->cb_out.active = 0;
if (!resv) {
resv = (struct dma_resv *)&dmabuf[1];
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 66470c37e471..02c2eb874da6 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -440,7 +440,7 @@ struct dma_buf {
wait_queue_head_t *poll;
__poll_t active;
- } cb_excl, cb_shared;
+ } cb_in, cb_out;
#ifdef CONFIG_DMABUF_SYSFS_STATS
/**
* @sysfs_entry:
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 928265e3601cde78c7e0a3e518a93b27defed3b1 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Fri, 22 Oct 2021 14:58:23 +0200
Subject: [PATCH] PM: sleep: Do not let "syscore" devices runtime-suspend
during system transitions
There is no reason to allow "syscore" devices to runtime-suspend
during system-wide PM transitions, because they are subject to the
same possible failure modes as any other devices in that respect.
Accordingly, change device_prepare() and device_complete() to call
pm_runtime_get_noresume() and pm_runtime_put(), respectively, for
"syscore" devices too.
Fixes: 057d51a1268f ("Merge branch 'pm-sleep'")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: 3.10+ <stable(a)vger.kernel.org> # 3.10+
Reviewed-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index cbea78e79f3d..fca6eab871fc 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1051,7 +1051,7 @@ static void device_complete(struct device *dev, pm_message_t state)
const char *info = NULL;
if (dev->power.syscore)
- return;
+ goto out;
device_lock(dev);
@@ -1081,6 +1081,7 @@ static void device_complete(struct device *dev, pm_message_t state)
device_unlock(dev);
+out:
pm_runtime_put(dev);
}
@@ -1794,9 +1795,6 @@ static int device_prepare(struct device *dev, pm_message_t state)
int (*callback)(struct device *) = NULL;
int ret = 0;
- if (dev->power.syscore)
- return 0;
-
/*
* If a device's parent goes into runtime suspend at the wrong time,
* it won't be possible to resume the device. To prevent this we
@@ -1805,6 +1803,9 @@ static int device_prepare(struct device *dev, pm_message_t state)
*/
pm_runtime_get_noresume(dev);
+ if (dev->power.syscore)
+ return 0;
+
device_lock(dev);
dev->power.wakeup_path = false;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 928265e3601cde78c7e0a3e518a93b27defed3b1 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Fri, 22 Oct 2021 14:58:23 +0200
Subject: [PATCH] PM: sleep: Do not let "syscore" devices runtime-suspend
during system transitions
There is no reason to allow "syscore" devices to runtime-suspend
during system-wide PM transitions, because they are subject to the
same possible failure modes as any other devices in that respect.
Accordingly, change device_prepare() and device_complete() to call
pm_runtime_get_noresume() and pm_runtime_put(), respectively, for
"syscore" devices too.
Fixes: 057d51a1268f ("Merge branch 'pm-sleep'")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: 3.10+ <stable(a)vger.kernel.org> # 3.10+
Reviewed-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index cbea78e79f3d..fca6eab871fc 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1051,7 +1051,7 @@ static void device_complete(struct device *dev, pm_message_t state)
const char *info = NULL;
if (dev->power.syscore)
- return;
+ goto out;
device_lock(dev);
@@ -1081,6 +1081,7 @@ static void device_complete(struct device *dev, pm_message_t state)
device_unlock(dev);
+out:
pm_runtime_put(dev);
}
@@ -1794,9 +1795,6 @@ static int device_prepare(struct device *dev, pm_message_t state)
int (*callback)(struct device *) = NULL;
int ret = 0;
- if (dev->power.syscore)
- return 0;
-
/*
* If a device's parent goes into runtime suspend at the wrong time,
* it won't be possible to resume the device. To prevent this we
@@ -1805,6 +1803,9 @@ static int device_prepare(struct device *dev, pm_message_t state)
*/
pm_runtime_get_noresume(dev);
+ if (dev->power.syscore)
+ return 0;
+
device_lock(dev);
dev->power.wakeup_path = false;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 928265e3601cde78c7e0a3e518a93b27defed3b1 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Fri, 22 Oct 2021 14:58:23 +0200
Subject: [PATCH] PM: sleep: Do not let "syscore" devices runtime-suspend
during system transitions
There is no reason to allow "syscore" devices to runtime-suspend
during system-wide PM transitions, because they are subject to the
same possible failure modes as any other devices in that respect.
Accordingly, change device_prepare() and device_complete() to call
pm_runtime_get_noresume() and pm_runtime_put(), respectively, for
"syscore" devices too.
Fixes: 057d51a1268f ("Merge branch 'pm-sleep'")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: 3.10+ <stable(a)vger.kernel.org> # 3.10+
Reviewed-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index cbea78e79f3d..fca6eab871fc 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1051,7 +1051,7 @@ static void device_complete(struct device *dev, pm_message_t state)
const char *info = NULL;
if (dev->power.syscore)
- return;
+ goto out;
device_lock(dev);
@@ -1081,6 +1081,7 @@ static void device_complete(struct device *dev, pm_message_t state)
device_unlock(dev);
+out:
pm_runtime_put(dev);
}
@@ -1794,9 +1795,6 @@ static int device_prepare(struct device *dev, pm_message_t state)
int (*callback)(struct device *) = NULL;
int ret = 0;
- if (dev->power.syscore)
- return 0;
-
/*
* If a device's parent goes into runtime suspend at the wrong time,
* it won't be possible to resume the device. To prevent this we
@@ -1805,6 +1803,9 @@ static int device_prepare(struct device *dev, pm_message_t state)
*/
pm_runtime_get_noresume(dev);
+ if (dev->power.syscore)
+ return 0;
+
device_lock(dev);
dev->power.wakeup_path = false;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 928265e3601cde78c7e0a3e518a93b27defed3b1 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Fri, 22 Oct 2021 14:58:23 +0200
Subject: [PATCH] PM: sleep: Do not let "syscore" devices runtime-suspend
during system transitions
There is no reason to allow "syscore" devices to runtime-suspend
during system-wide PM transitions, because they are subject to the
same possible failure modes as any other devices in that respect.
Accordingly, change device_prepare() and device_complete() to call
pm_runtime_get_noresume() and pm_runtime_put(), respectively, for
"syscore" devices too.
Fixes: 057d51a1268f ("Merge branch 'pm-sleep'")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: 3.10+ <stable(a)vger.kernel.org> # 3.10+
Reviewed-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index cbea78e79f3d..fca6eab871fc 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1051,7 +1051,7 @@ static void device_complete(struct device *dev, pm_message_t state)
const char *info = NULL;
if (dev->power.syscore)
- return;
+ goto out;
device_lock(dev);
@@ -1081,6 +1081,7 @@ static void device_complete(struct device *dev, pm_message_t state)
device_unlock(dev);
+out:
pm_runtime_put(dev);
}
@@ -1794,9 +1795,6 @@ static int device_prepare(struct device *dev, pm_message_t state)
int (*callback)(struct device *) = NULL;
int ret = 0;
- if (dev->power.syscore)
- return 0;
-
/*
* If a device's parent goes into runtime suspend at the wrong time,
* it won't be possible to resume the device. To prevent this we
@@ -1805,6 +1803,9 @@ static int device_prepare(struct device *dev, pm_message_t state)
*/
pm_runtime_get_noresume(dev);
+ if (dev->power.syscore)
+ return 0;
+
device_lock(dev);
dev->power.wakeup_path = false;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 928265e3601cde78c7e0a3e518a93b27defed3b1 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Fri, 22 Oct 2021 14:58:23 +0200
Subject: [PATCH] PM: sleep: Do not let "syscore" devices runtime-suspend
during system transitions
There is no reason to allow "syscore" devices to runtime-suspend
during system-wide PM transitions, because they are subject to the
same possible failure modes as any other devices in that respect.
Accordingly, change device_prepare() and device_complete() to call
pm_runtime_get_noresume() and pm_runtime_put(), respectively, for
"syscore" devices too.
Fixes: 057d51a1268f ("Merge branch 'pm-sleep'")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: 3.10+ <stable(a)vger.kernel.org> # 3.10+
Reviewed-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index cbea78e79f3d..fca6eab871fc 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1051,7 +1051,7 @@ static void device_complete(struct device *dev, pm_message_t state)
const char *info = NULL;
if (dev->power.syscore)
- return;
+ goto out;
device_lock(dev);
@@ -1081,6 +1081,7 @@ static void device_complete(struct device *dev, pm_message_t state)
device_unlock(dev);
+out:
pm_runtime_put(dev);
}
@@ -1794,9 +1795,6 @@ static int device_prepare(struct device *dev, pm_message_t state)
int (*callback)(struct device *) = NULL;
int ret = 0;
- if (dev->power.syscore)
- return 0;
-
/*
* If a device's parent goes into runtime suspend at the wrong time,
* it won't be possible to resume the device. To prevent this we
@@ -1805,6 +1803,9 @@ static int device_prepare(struct device *dev, pm_message_t state)
*/
pm_runtime_get_noresume(dev);
+ if (dev->power.syscore)
+ return 0;
+
device_lock(dev);
dev->power.wakeup_path = false;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3c08b0931eedd04c530040499fadeccab50ed646 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj(a)kernel.org>
Date: Thu, 14 Oct 2021 13:20:22 -1000
Subject: [PATCH] blk-cgroup: blk_cgroup_bio_start() should use irq-safe
operations on blkg->iostat_cpu
c3df5fb57fe8 ("cgroup: rstat: fix A-A deadlock on 32bit around
u64_stats_sync") made u64_stats updates irq-safe to avoid A-A deadlocks.
Unfortunately, the conversion missed one in blk_cgroup_bio_start(). Fix it.
Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
Cc: stable(a)vger.kernel.org # v5.13+
Reported-by: syzbot+9738c8815b375ce482a1(a)syzkaller.appspotmail.com
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Link: https://lore.kernel.org/r/YWi7NrQdVlxD6J9W@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 38b9f7684952..9a1c5839dd46 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1897,10 +1897,11 @@ void blk_cgroup_bio_start(struct bio *bio)
{
int rwd = blk_cgroup_io_type(bio), cpu;
struct blkg_iostat_set *bis;
+ unsigned long flags;
cpu = get_cpu();
bis = per_cpu_ptr(bio->bi_blkg->iostat_cpu, cpu);
- u64_stats_update_begin(&bis->sync);
+ flags = u64_stats_update_begin_irqsave(&bis->sync);
/*
* If the bio is flagged with BIO_CGROUP_ACCT it means this is a split
@@ -1912,7 +1913,7 @@ void blk_cgroup_bio_start(struct bio *bio)
}
bis->cur.ios[rwd]++;
- u64_stats_update_end(&bis->sync);
+ u64_stats_update_end_irqrestore(&bis->sync, flags);
if (cgroup_subsys_on_dfl(io_cgrp_subsys))
cgroup_rstat_updated(bio->bi_blkg->blkcg->css.cgroup, cpu);
put_cpu();
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3c08b0931eedd04c530040499fadeccab50ed646 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj(a)kernel.org>
Date: Thu, 14 Oct 2021 13:20:22 -1000
Subject: [PATCH] blk-cgroup: blk_cgroup_bio_start() should use irq-safe
operations on blkg->iostat_cpu
c3df5fb57fe8 ("cgroup: rstat: fix A-A deadlock on 32bit around
u64_stats_sync") made u64_stats updates irq-safe to avoid A-A deadlocks.
Unfortunately, the conversion missed one in blk_cgroup_bio_start(). Fix it.
Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
Cc: stable(a)vger.kernel.org # v5.13+
Reported-by: syzbot+9738c8815b375ce482a1(a)syzkaller.appspotmail.com
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Link: https://lore.kernel.org/r/YWi7NrQdVlxD6J9W@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 38b9f7684952..9a1c5839dd46 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1897,10 +1897,11 @@ void blk_cgroup_bio_start(struct bio *bio)
{
int rwd = blk_cgroup_io_type(bio), cpu;
struct blkg_iostat_set *bis;
+ unsigned long flags;
cpu = get_cpu();
bis = per_cpu_ptr(bio->bi_blkg->iostat_cpu, cpu);
- u64_stats_update_begin(&bis->sync);
+ flags = u64_stats_update_begin_irqsave(&bis->sync);
/*
* If the bio is flagged with BIO_CGROUP_ACCT it means this is a split
@@ -1912,7 +1913,7 @@ void blk_cgroup_bio_start(struct bio *bio)
}
bis->cur.ios[rwd]++;
- u64_stats_update_end(&bis->sync);
+ u64_stats_update_end_irqrestore(&bis->sync, flags);
if (cgroup_subsys_on_dfl(io_cgrp_subsys))
cgroup_rstat_updated(bio->bi_blkg->blkcg->css.cgroup, cpu);
put_cpu();
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a7fda04bc9b6ad9da8e19c9e6e3b1dab773d068a Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
Date: Fri, 8 Oct 2021 13:37:14 +0200
Subject: [PATCH] regulator: dt-bindings: samsung,s5m8767: correct
s5m8767,pmic-buck-default-dvs-idx property
The driver was always parsing "s5m8767,pmic-buck-default-dvs-idx", not
"s5m8767,pmic-buck234-default-dvs-idx".
Cc: <stable(a)vger.kernel.org>
Fixes: 26aec009f6b6 ("regulator: add device tree support for s5m8767")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
Acked-by: Rob Herring <robh(a)kernel.org>
Message-Id: <20211008113723.134648-3-krzysztof.kozlowski(a)canonical.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/Documentation/devicetree/bindings/regulator/samsung,s5m8767.txt b/Documentation/devicetree/bindings/regulator/samsung,s5m8767.txt
index d9cff1614f7a..6cd83d920155 100644
--- a/Documentation/devicetree/bindings/regulator/samsung,s5m8767.txt
+++ b/Documentation/devicetree/bindings/regulator/samsung,s5m8767.txt
@@ -39,7 +39,7 @@ Optional properties of the main device node (the parent!):
Additional properties required if either of the optional properties are used:
- - s5m8767,pmic-buck234-default-dvs-idx: Default voltage setting selected from
+ - s5m8767,pmic-buck-default-dvs-idx: Default voltage setting selected from
the possible 8 options selectable by the dvs gpios. The value of this
property should be between 0 and 7. If not specified or if out of range, the
default value of this property is set to 0.
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b16bef60a9112b1e6daf3afd16484eb06e7ce792 Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
Date: Fri, 8 Oct 2021 13:37:13 +0200
Subject: [PATCH] regulator: s5m8767: do not use reset value as DVS voltage if
GPIO DVS is disabled
The driver and its bindings, before commit 04f9f068a619 ("regulator:
s5m8767: Modify parsing method of the voltage table of buck2/3/4") were
requiring to provide at least one safe/default voltage for DVS registers
if DVS GPIO is not being enabled.
IOW, if s5m8767,pmic-buck2-uses-gpio-dvs is missing, the
s5m8767,pmic-buck2-dvs-voltage should still be present and contain one
voltage.
This requirement was coming from driver behavior matching this condition
(none of DVS GPIO is enabled): it was always initializing the DVS
selector pins to 0 and keeping the DVS enable setting at reset value
(enabled). Therefore if none of DVS GPIO is enabled in devicetree,
driver was configuring the first DVS voltage for buck[234].
Mentioned commit 04f9f068a619 ("regulator: s5m8767: Modify parsing
method of the voltage table of buck2/3/4") broke it because DVS voltage
won't be parsed from devicetree if DVS GPIO is not enabled. After the
change, driver will configure bucks to use the register reset value as
voltage which might have unpleasant effects.
Fix this by relaxing the bindings constrain: if DVS GPIO is not enabled
in devicetree (therefore DVS voltage is also not parsed), explicitly
disable it.
Cc: <stable(a)vger.kernel.org>
Fixes: 04f9f068a619 ("regulator: s5m8767: Modify parsing method of the voltage table of buck2/3/4")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
Acked-by: Rob Herring <robh(a)kernel.org>
Message-Id: <20211008113723.134648-2-krzysztof.kozlowski(a)canonical.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/Documentation/devicetree/bindings/regulator/samsung,s5m8767.txt b/Documentation/devicetree/bindings/regulator/samsung,s5m8767.txt
index 093edda0c8df..d9cff1614f7a 100644
--- a/Documentation/devicetree/bindings/regulator/samsung,s5m8767.txt
+++ b/Documentation/devicetree/bindings/regulator/samsung,s5m8767.txt
@@ -13,6 +13,14 @@ common regulator binding documented in:
Required properties of the main device node (the parent!):
+ - s5m8767,pmic-buck-ds-gpios: GPIO specifiers for three host gpio's used
+ for selecting GPIO DVS lines. It is one-to-one mapped to dvs gpio lines.
+
+ [1] If either of the 's5m8767,pmic-buck[2/3/4]-uses-gpio-dvs' optional
+ property is specified, then all the eight voltage values for the
+ 's5m8767,pmic-buck[2/3/4]-dvs-voltage' should be specified.
+
+Optional properties of the main device node (the parent!):
- s5m8767,pmic-buck2-dvs-voltage: A set of 8 voltage values in micro-volt (uV)
units for buck2 when changing voltage using gpio dvs. Refer to [1] below
for additional information.
@@ -25,19 +33,6 @@ Required properties of the main device node (the parent!):
units for buck4 when changing voltage using gpio dvs. Refer to [1] below
for additional information.
- - s5m8767,pmic-buck-ds-gpios: GPIO specifiers for three host gpio's used
- for selecting GPIO DVS lines. It is one-to-one mapped to dvs gpio lines.
-
- [1] If none of the 's5m8767,pmic-buck[2/3/4]-uses-gpio-dvs' optional
- property is specified, the 's5m8767,pmic-buck[2/3/4]-dvs-voltage'
- property should specify atleast one voltage level (which would be a
- safe operating voltage).
-
- If either of the 's5m8767,pmic-buck[2/3/4]-uses-gpio-dvs' optional
- property is specified, then all the eight voltage values for the
- 's5m8767,pmic-buck[2/3/4]-dvs-voltage' should be specified.
-
-Optional properties of the main device node (the parent!):
- s5m8767,pmic-buck2-uses-gpio-dvs: 'buck2' can be controlled by gpio dvs.
- s5m8767,pmic-buck3-uses-gpio-dvs: 'buck3' can be controlled by gpio dvs.
- s5m8767,pmic-buck4-uses-gpio-dvs: 'buck4' can be controlled by gpio dvs.
diff --git a/drivers/regulator/s5m8767.c b/drivers/regulator/s5m8767.c
index 7c111bbdc2af..35269f998210 100644
--- a/drivers/regulator/s5m8767.c
+++ b/drivers/regulator/s5m8767.c
@@ -850,18 +850,15 @@ static int s5m8767_pmic_probe(struct platform_device *pdev)
/* DS4 GPIO */
gpio_direction_output(pdata->buck_ds[2], 0x0);
- if (pdata->buck2_gpiodvs || pdata->buck3_gpiodvs ||
- pdata->buck4_gpiodvs) {
- regmap_update_bits(s5m8767->iodev->regmap_pmic,
- S5M8767_REG_BUCK2CTRL, 1 << 1,
- (pdata->buck2_gpiodvs) ? (1 << 1) : (0 << 1));
- regmap_update_bits(s5m8767->iodev->regmap_pmic,
- S5M8767_REG_BUCK3CTRL, 1 << 1,
- (pdata->buck3_gpiodvs) ? (1 << 1) : (0 << 1));
- regmap_update_bits(s5m8767->iodev->regmap_pmic,
- S5M8767_REG_BUCK4CTRL, 1 << 1,
- (pdata->buck4_gpiodvs) ? (1 << 1) : (0 << 1));
- }
+ regmap_update_bits(s5m8767->iodev->regmap_pmic,
+ S5M8767_REG_BUCK2CTRL, 1 << 1,
+ (pdata->buck2_gpiodvs) ? (1 << 1) : (0 << 1));
+ regmap_update_bits(s5m8767->iodev->regmap_pmic,
+ S5M8767_REG_BUCK3CTRL, 1 << 1,
+ (pdata->buck3_gpiodvs) ? (1 << 1) : (0 << 1));
+ regmap_update_bits(s5m8767->iodev->regmap_pmic,
+ S5M8767_REG_BUCK4CTRL, 1 << 1,
+ (pdata->buck4_gpiodvs) ? (1 << 1) : (0 << 1));
/* Initialize GPIO DVS registers */
for (i = 0; i < 8; i++) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From db05ddf7f321634c5659a0cf7ea56594e22365f7 Mon Sep 17 00:00:00 2001
From: Corey Minyard <cminyard(a)mvista.com>
Date: Mon, 20 Sep 2021 06:25:37 -0500
Subject: [PATCH] ipmi:watchdog: Set panic count to proper value on a panic
You will get two decrements when the messages on a panic are sent, not
one, since commit 2033f6858970 ("ipmi: Free receive messages when in an
oops") was added, but the watchdog code had a bug where it didn't set
the value properly.
Reported-by: Anton Lundin <glance(a)acc.umu.se>
Cc: <Stable(a)vger.kernel.org> # v5.4+
Fixes: 2033f6858970 ("ipmi: Free receive messages when in an oops")
Signed-off-by: Corey Minyard <cminyard(a)mvista.com>
diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index e4ff3b50de7f..f855a9665c28 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -497,7 +497,7 @@ static void panic_halt_ipmi_heartbeat(void)
msg.cmd = IPMI_WDOG_RESET_TIMER;
msg.data = NULL;
msg.data_len = 0;
- atomic_inc(&panic_done_count);
+ atomic_add(2, &panic_done_count);
rv = ipmi_request_supply_msgs(watchdog_user,
(struct ipmi_addr *) &addr,
0,
@@ -507,7 +507,7 @@ static void panic_halt_ipmi_heartbeat(void)
&panic_halt_heartbeat_recv_msg,
1);
if (rv)
- atomic_dec(&panic_done_count);
+ atomic_sub(2, &panic_done_count);
}
static struct ipmi_smi_msg panic_halt_smi_msg = {
@@ -531,12 +531,12 @@ static void panic_halt_ipmi_set_timeout(void)
/* Wait for the messages to be free. */
while (atomic_read(&panic_done_count) != 0)
ipmi_poll_interface(watchdog_user);
- atomic_inc(&panic_done_count);
+ atomic_add(2, &panic_done_count);
rv = __ipmi_set_timeout(&panic_halt_smi_msg,
&panic_halt_recv_msg,
&send_heartbeat_now);
if (rv) {
- atomic_dec(&panic_done_count);
+ atomic_sub(2, &panic_done_count);
pr_warn("Unable to extend the watchdog timeout\n");
} else {
if (send_heartbeat_now)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From db05ddf7f321634c5659a0cf7ea56594e22365f7 Mon Sep 17 00:00:00 2001
From: Corey Minyard <cminyard(a)mvista.com>
Date: Mon, 20 Sep 2021 06:25:37 -0500
Subject: [PATCH] ipmi:watchdog: Set panic count to proper value on a panic
You will get two decrements when the messages on a panic are sent, not
one, since commit 2033f6858970 ("ipmi: Free receive messages when in an
oops") was added, but the watchdog code had a bug where it didn't set
the value properly.
Reported-by: Anton Lundin <glance(a)acc.umu.se>
Cc: <Stable(a)vger.kernel.org> # v5.4+
Fixes: 2033f6858970 ("ipmi: Free receive messages when in an oops")
Signed-off-by: Corey Minyard <cminyard(a)mvista.com>
diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index e4ff3b50de7f..f855a9665c28 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -497,7 +497,7 @@ static void panic_halt_ipmi_heartbeat(void)
msg.cmd = IPMI_WDOG_RESET_TIMER;
msg.data = NULL;
msg.data_len = 0;
- atomic_inc(&panic_done_count);
+ atomic_add(2, &panic_done_count);
rv = ipmi_request_supply_msgs(watchdog_user,
(struct ipmi_addr *) &addr,
0,
@@ -507,7 +507,7 @@ static void panic_halt_ipmi_heartbeat(void)
&panic_halt_heartbeat_recv_msg,
1);
if (rv)
- atomic_dec(&panic_done_count);
+ atomic_sub(2, &panic_done_count);
}
static struct ipmi_smi_msg panic_halt_smi_msg = {
@@ -531,12 +531,12 @@ static void panic_halt_ipmi_set_timeout(void)
/* Wait for the messages to be free. */
while (atomic_read(&panic_done_count) != 0)
ipmi_poll_interface(watchdog_user);
- atomic_inc(&panic_done_count);
+ atomic_add(2, &panic_done_count);
rv = __ipmi_set_timeout(&panic_halt_smi_msg,
&panic_halt_recv_msg,
&send_heartbeat_now);
if (rv) {
- atomic_dec(&panic_done_count);
+ atomic_sub(2, &panic_done_count);
pr_warn("Unable to extend the watchdog timeout\n");
} else {
if (send_heartbeat_now)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cbfcd13be5cb2a07868afe67520ed181956579a7 Mon Sep 17 00:00:00 2001
From: Ondrej Mosnacek <omosnace(a)redhat.com>
Date: Wed, 28 Jul 2021 16:03:13 +0200
Subject: [PATCH] selinux: fix race condition when computing ocontext SIDs
Current code contains a lot of racy patterns when converting an
ocontext's context structure to an SID. This is being done in a "lazy"
fashion, such that the SID is looked up in the SID table only when it's
first needed and then cached in the "sid" field of the ocontext
structure. However, this is done without any locking or memory barriers
and is thus unsafe.
Between commits 24ed7fdae669 ("selinux: use separate table for initial
SID lookup") and 66f8e2f03c02 ("selinux: sidtab reverse lookup hash
table"), this race condition lead to an actual observable bug, because a
pointer to the shared sid field was passed directly to
sidtab_context_to_sid(), which was using this location to also store an
intermediate value, which could have been read by other threads and
interpreted as an SID. In practice this caused e.g. new mounts to get a
wrong (seemingly random) filesystem context, leading to strange denials.
This bug has been spotted in the wild at least twice, see [1] and [2].
Fix the race condition by making all the racy functions use a common
helper that ensures the ocontext::sid accesses are made safely using the
appropriate SMP constructs.
Note that security_netif_sid() was populating the sid field of both
contexts stored in the ocontext, but only the first one was actually
used. The SELinux wiki's documentation on the "netifcon" policy
statement [3] suggests that using only the first context is intentional.
I kept only the handling of the first context here, as there is really
no point in doing the SID lookup for the unused one.
I wasn't able to reproduce the bug mentioned above on any kernel that
includes commit 66f8e2f03c02, even though it has been reported that the
issue occurs with that commit, too, just less frequently. Thus, I wasn't
able to verify that this patch fixes the issue, but it makes sense to
avoid the race condition regardless.
[1] https://github.com/containers/container-selinux/issues/89
[2] https://lists.fedoraproject.org/archives/list/selinux@lists.fedoraproject.o…
[3] https://selinuxproject.org/page/NetworkStatements#netifcon
Cc: stable(a)vger.kernel.org
Cc: Xinjie Zheng <xinjie(a)google.com>
Reported-by: Sujithra Periasamy <sujithra(a)google.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace(a)redhat.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index e5f1b2757a83..c4931bf6f92a 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -2376,6 +2376,43 @@ err_policy:
return rc;
}
+/**
+ * ocontext_to_sid - Helper to safely get sid for an ocontext
+ * @sidtab: SID table
+ * @c: ocontext structure
+ * @index: index of the context entry (0 or 1)
+ * @out_sid: pointer to the resulting SID value
+ *
+ * For all ocontexts except OCON_ISID the SID fields are populated
+ * on-demand when needed. Since updating the SID value is an SMP-sensitive
+ * operation, this helper must be used to do that safely.
+ *
+ * WARNING: This function may return -ESTALE, indicating that the caller
+ * must retry the operation after re-acquiring the policy pointer!
+ */
+static int ocontext_to_sid(struct sidtab *sidtab, struct ocontext *c,
+ size_t index, u32 *out_sid)
+{
+ int rc;
+ u32 sid;
+
+ /* Ensure the associated sidtab entry is visible to this thread. */
+ sid = smp_load_acquire(&c->sid[index]);
+ if (!sid) {
+ rc = sidtab_context_to_sid(sidtab, &c->context[index], &sid);
+ if (rc)
+ return rc;
+
+ /*
+ * Ensure the new sidtab entry is visible to other threads
+ * when they see the SID.
+ */
+ smp_store_release(&c->sid[index], sid);
+ }
+ *out_sid = sid;
+ return 0;
+}
+
/**
* security_port_sid - Obtain the SID for a port.
* @state: SELinux state
@@ -2414,17 +2451,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else {
*out_sid = SECINITSID_PORT;
}
@@ -2473,18 +2506,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab,
- &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else
*out_sid = SECINITSID_UNLABELED;
@@ -2533,17 +2561,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else
*out_sid = SECINITSID_UNLABELED;
@@ -2587,25 +2611,13 @@ retry:
}
if (c) {
- if (!c->sid[0] || !c->sid[1]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
- rc = sidtab_context_to_sid(sidtab, &c->context[1],
- &c->sid[1]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, if_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *if_sid = c->sid[0];
+ if (rc)
+ goto out;
} else
*if_sid = SECINITSID_NETIF;
@@ -2697,18 +2709,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab,
- &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else {
*out_sid = SECINITSID_NODE;
}
@@ -2873,7 +2880,7 @@ static inline int __security_genfs_sid(struct selinux_policy *policy,
u16 sclass;
struct genfs *genfs;
struct ocontext *c;
- int rc, cmp = 0;
+ int cmp = 0;
while (path[0] == '/' && path[1] == '/')
path++;
@@ -2887,9 +2894,8 @@ static inline int __security_genfs_sid(struct selinux_policy *policy,
break;
}
- rc = -ENOENT;
if (!genfs || cmp)
- goto out;
+ return -ENOENT;
for (c = genfs->head; c; c = c->next) {
len = strlen(c->u.name);
@@ -2898,20 +2904,10 @@ static inline int __security_genfs_sid(struct selinux_policy *policy,
break;
}
- rc = -ENOENT;
if (!c)
- goto out;
-
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0], &c->sid[0]);
- if (rc)
- goto out;
- }
+ return -ENOENT;
- *sid = c->sid[0];
- rc = 0;
-out:
- return rc;
+ return ocontext_to_sid(sidtab, c, 0, sid);
}
/**
@@ -2996,17 +2992,13 @@ retry:
if (c) {
sbsec->behavior = c->v.behavior;
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, &sbsec->sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- sbsec->sid = c->sid[0];
+ if (rc)
+ goto out;
} else {
rc = __security_genfs_sid(policy, fstype, "/",
SECCLASS_DIR, &sbsec->sid);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cbfcd13be5cb2a07868afe67520ed181956579a7 Mon Sep 17 00:00:00 2001
From: Ondrej Mosnacek <omosnace(a)redhat.com>
Date: Wed, 28 Jul 2021 16:03:13 +0200
Subject: [PATCH] selinux: fix race condition when computing ocontext SIDs
Current code contains a lot of racy patterns when converting an
ocontext's context structure to an SID. This is being done in a "lazy"
fashion, such that the SID is looked up in the SID table only when it's
first needed and then cached in the "sid" field of the ocontext
structure. However, this is done without any locking or memory barriers
and is thus unsafe.
Between commits 24ed7fdae669 ("selinux: use separate table for initial
SID lookup") and 66f8e2f03c02 ("selinux: sidtab reverse lookup hash
table"), this race condition lead to an actual observable bug, because a
pointer to the shared sid field was passed directly to
sidtab_context_to_sid(), which was using this location to also store an
intermediate value, which could have been read by other threads and
interpreted as an SID. In practice this caused e.g. new mounts to get a
wrong (seemingly random) filesystem context, leading to strange denials.
This bug has been spotted in the wild at least twice, see [1] and [2].
Fix the race condition by making all the racy functions use a common
helper that ensures the ocontext::sid accesses are made safely using the
appropriate SMP constructs.
Note that security_netif_sid() was populating the sid field of both
contexts stored in the ocontext, but only the first one was actually
used. The SELinux wiki's documentation on the "netifcon" policy
statement [3] suggests that using only the first context is intentional.
I kept only the handling of the first context here, as there is really
no point in doing the SID lookup for the unused one.
I wasn't able to reproduce the bug mentioned above on any kernel that
includes commit 66f8e2f03c02, even though it has been reported that the
issue occurs with that commit, too, just less frequently. Thus, I wasn't
able to verify that this patch fixes the issue, but it makes sense to
avoid the race condition regardless.
[1] https://github.com/containers/container-selinux/issues/89
[2] https://lists.fedoraproject.org/archives/list/selinux@lists.fedoraproject.o…
[3] https://selinuxproject.org/page/NetworkStatements#netifcon
Cc: stable(a)vger.kernel.org
Cc: Xinjie Zheng <xinjie(a)google.com>
Reported-by: Sujithra Periasamy <sujithra(a)google.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace(a)redhat.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index e5f1b2757a83..c4931bf6f92a 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -2376,6 +2376,43 @@ err_policy:
return rc;
}
+/**
+ * ocontext_to_sid - Helper to safely get sid for an ocontext
+ * @sidtab: SID table
+ * @c: ocontext structure
+ * @index: index of the context entry (0 or 1)
+ * @out_sid: pointer to the resulting SID value
+ *
+ * For all ocontexts except OCON_ISID the SID fields are populated
+ * on-demand when needed. Since updating the SID value is an SMP-sensitive
+ * operation, this helper must be used to do that safely.
+ *
+ * WARNING: This function may return -ESTALE, indicating that the caller
+ * must retry the operation after re-acquiring the policy pointer!
+ */
+static int ocontext_to_sid(struct sidtab *sidtab, struct ocontext *c,
+ size_t index, u32 *out_sid)
+{
+ int rc;
+ u32 sid;
+
+ /* Ensure the associated sidtab entry is visible to this thread. */
+ sid = smp_load_acquire(&c->sid[index]);
+ if (!sid) {
+ rc = sidtab_context_to_sid(sidtab, &c->context[index], &sid);
+ if (rc)
+ return rc;
+
+ /*
+ * Ensure the new sidtab entry is visible to other threads
+ * when they see the SID.
+ */
+ smp_store_release(&c->sid[index], sid);
+ }
+ *out_sid = sid;
+ return 0;
+}
+
/**
* security_port_sid - Obtain the SID for a port.
* @state: SELinux state
@@ -2414,17 +2451,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else {
*out_sid = SECINITSID_PORT;
}
@@ -2473,18 +2506,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab,
- &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else
*out_sid = SECINITSID_UNLABELED;
@@ -2533,17 +2561,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else
*out_sid = SECINITSID_UNLABELED;
@@ -2587,25 +2611,13 @@ retry:
}
if (c) {
- if (!c->sid[0] || !c->sid[1]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
- rc = sidtab_context_to_sid(sidtab, &c->context[1],
- &c->sid[1]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, if_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *if_sid = c->sid[0];
+ if (rc)
+ goto out;
} else
*if_sid = SECINITSID_NETIF;
@@ -2697,18 +2709,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab,
- &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else {
*out_sid = SECINITSID_NODE;
}
@@ -2873,7 +2880,7 @@ static inline int __security_genfs_sid(struct selinux_policy *policy,
u16 sclass;
struct genfs *genfs;
struct ocontext *c;
- int rc, cmp = 0;
+ int cmp = 0;
while (path[0] == '/' && path[1] == '/')
path++;
@@ -2887,9 +2894,8 @@ static inline int __security_genfs_sid(struct selinux_policy *policy,
break;
}
- rc = -ENOENT;
if (!genfs || cmp)
- goto out;
+ return -ENOENT;
for (c = genfs->head; c; c = c->next) {
len = strlen(c->u.name);
@@ -2898,20 +2904,10 @@ static inline int __security_genfs_sid(struct selinux_policy *policy,
break;
}
- rc = -ENOENT;
if (!c)
- goto out;
-
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0], &c->sid[0]);
- if (rc)
- goto out;
- }
+ return -ENOENT;
- *sid = c->sid[0];
- rc = 0;
-out:
- return rc;
+ return ocontext_to_sid(sidtab, c, 0, sid);
}
/**
@@ -2996,17 +2992,13 @@ retry:
if (c) {
sbsec->behavior = c->v.behavior;
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, &sbsec->sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- sbsec->sid = c->sid[0];
+ if (rc)
+ goto out;
} else {
rc = __security_genfs_sid(policy, fstype, "/",
SECCLASS_DIR, &sbsec->sid);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cbfcd13be5cb2a07868afe67520ed181956579a7 Mon Sep 17 00:00:00 2001
From: Ondrej Mosnacek <omosnace(a)redhat.com>
Date: Wed, 28 Jul 2021 16:03:13 +0200
Subject: [PATCH] selinux: fix race condition when computing ocontext SIDs
Current code contains a lot of racy patterns when converting an
ocontext's context structure to an SID. This is being done in a "lazy"
fashion, such that the SID is looked up in the SID table only when it's
first needed and then cached in the "sid" field of the ocontext
structure. However, this is done without any locking or memory barriers
and is thus unsafe.
Between commits 24ed7fdae669 ("selinux: use separate table for initial
SID lookup") and 66f8e2f03c02 ("selinux: sidtab reverse lookup hash
table"), this race condition lead to an actual observable bug, because a
pointer to the shared sid field was passed directly to
sidtab_context_to_sid(), which was using this location to also store an
intermediate value, which could have been read by other threads and
interpreted as an SID. In practice this caused e.g. new mounts to get a
wrong (seemingly random) filesystem context, leading to strange denials.
This bug has been spotted in the wild at least twice, see [1] and [2].
Fix the race condition by making all the racy functions use a common
helper that ensures the ocontext::sid accesses are made safely using the
appropriate SMP constructs.
Note that security_netif_sid() was populating the sid field of both
contexts stored in the ocontext, but only the first one was actually
used. The SELinux wiki's documentation on the "netifcon" policy
statement [3] suggests that using only the first context is intentional.
I kept only the handling of the first context here, as there is really
no point in doing the SID lookup for the unused one.
I wasn't able to reproduce the bug mentioned above on any kernel that
includes commit 66f8e2f03c02, even though it has been reported that the
issue occurs with that commit, too, just less frequently. Thus, I wasn't
able to verify that this patch fixes the issue, but it makes sense to
avoid the race condition regardless.
[1] https://github.com/containers/container-selinux/issues/89
[2] https://lists.fedoraproject.org/archives/list/selinux@lists.fedoraproject.o…
[3] https://selinuxproject.org/page/NetworkStatements#netifcon
Cc: stable(a)vger.kernel.org
Cc: Xinjie Zheng <xinjie(a)google.com>
Reported-by: Sujithra Periasamy <sujithra(a)google.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace(a)redhat.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index e5f1b2757a83..c4931bf6f92a 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -2376,6 +2376,43 @@ err_policy:
return rc;
}
+/**
+ * ocontext_to_sid - Helper to safely get sid for an ocontext
+ * @sidtab: SID table
+ * @c: ocontext structure
+ * @index: index of the context entry (0 or 1)
+ * @out_sid: pointer to the resulting SID value
+ *
+ * For all ocontexts except OCON_ISID the SID fields are populated
+ * on-demand when needed. Since updating the SID value is an SMP-sensitive
+ * operation, this helper must be used to do that safely.
+ *
+ * WARNING: This function may return -ESTALE, indicating that the caller
+ * must retry the operation after re-acquiring the policy pointer!
+ */
+static int ocontext_to_sid(struct sidtab *sidtab, struct ocontext *c,
+ size_t index, u32 *out_sid)
+{
+ int rc;
+ u32 sid;
+
+ /* Ensure the associated sidtab entry is visible to this thread. */
+ sid = smp_load_acquire(&c->sid[index]);
+ if (!sid) {
+ rc = sidtab_context_to_sid(sidtab, &c->context[index], &sid);
+ if (rc)
+ return rc;
+
+ /*
+ * Ensure the new sidtab entry is visible to other threads
+ * when they see the SID.
+ */
+ smp_store_release(&c->sid[index], sid);
+ }
+ *out_sid = sid;
+ return 0;
+}
+
/**
* security_port_sid - Obtain the SID for a port.
* @state: SELinux state
@@ -2414,17 +2451,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else {
*out_sid = SECINITSID_PORT;
}
@@ -2473,18 +2506,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab,
- &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else
*out_sid = SECINITSID_UNLABELED;
@@ -2533,17 +2561,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else
*out_sid = SECINITSID_UNLABELED;
@@ -2587,25 +2611,13 @@ retry:
}
if (c) {
- if (!c->sid[0] || !c->sid[1]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
- rc = sidtab_context_to_sid(sidtab, &c->context[1],
- &c->sid[1]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, if_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *if_sid = c->sid[0];
+ if (rc)
+ goto out;
} else
*if_sid = SECINITSID_NETIF;
@@ -2697,18 +2709,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab,
- &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else {
*out_sid = SECINITSID_NODE;
}
@@ -2873,7 +2880,7 @@ static inline int __security_genfs_sid(struct selinux_policy *policy,
u16 sclass;
struct genfs *genfs;
struct ocontext *c;
- int rc, cmp = 0;
+ int cmp = 0;
while (path[0] == '/' && path[1] == '/')
path++;
@@ -2887,9 +2894,8 @@ static inline int __security_genfs_sid(struct selinux_policy *policy,
break;
}
- rc = -ENOENT;
if (!genfs || cmp)
- goto out;
+ return -ENOENT;
for (c = genfs->head; c; c = c->next) {
len = strlen(c->u.name);
@@ -2898,20 +2904,10 @@ static inline int __security_genfs_sid(struct selinux_policy *policy,
break;
}
- rc = -ENOENT;
if (!c)
- goto out;
-
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0], &c->sid[0]);
- if (rc)
- goto out;
- }
+ return -ENOENT;
- *sid = c->sid[0];
- rc = 0;
-out:
- return rc;
+ return ocontext_to_sid(sidtab, c, 0, sid);
}
/**
@@ -2996,17 +2992,13 @@ retry:
if (c) {
sbsec->behavior = c->v.behavior;
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, &sbsec->sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- sbsec->sid = c->sid[0];
+ if (rc)
+ goto out;
} else {
rc = __security_genfs_sid(policy, fstype, "/",
SECCLASS_DIR, &sbsec->sid);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cbfcd13be5cb2a07868afe67520ed181956579a7 Mon Sep 17 00:00:00 2001
From: Ondrej Mosnacek <omosnace(a)redhat.com>
Date: Wed, 28 Jul 2021 16:03:13 +0200
Subject: [PATCH] selinux: fix race condition when computing ocontext SIDs
Current code contains a lot of racy patterns when converting an
ocontext's context structure to an SID. This is being done in a "lazy"
fashion, such that the SID is looked up in the SID table only when it's
first needed and then cached in the "sid" field of the ocontext
structure. However, this is done without any locking or memory barriers
and is thus unsafe.
Between commits 24ed7fdae669 ("selinux: use separate table for initial
SID lookup") and 66f8e2f03c02 ("selinux: sidtab reverse lookup hash
table"), this race condition lead to an actual observable bug, because a
pointer to the shared sid field was passed directly to
sidtab_context_to_sid(), which was using this location to also store an
intermediate value, which could have been read by other threads and
interpreted as an SID. In practice this caused e.g. new mounts to get a
wrong (seemingly random) filesystem context, leading to strange denials.
This bug has been spotted in the wild at least twice, see [1] and [2].
Fix the race condition by making all the racy functions use a common
helper that ensures the ocontext::sid accesses are made safely using the
appropriate SMP constructs.
Note that security_netif_sid() was populating the sid field of both
contexts stored in the ocontext, but only the first one was actually
used. The SELinux wiki's documentation on the "netifcon" policy
statement [3] suggests that using only the first context is intentional.
I kept only the handling of the first context here, as there is really
no point in doing the SID lookup for the unused one.
I wasn't able to reproduce the bug mentioned above on any kernel that
includes commit 66f8e2f03c02, even though it has been reported that the
issue occurs with that commit, too, just less frequently. Thus, I wasn't
able to verify that this patch fixes the issue, but it makes sense to
avoid the race condition regardless.
[1] https://github.com/containers/container-selinux/issues/89
[2] https://lists.fedoraproject.org/archives/list/selinux@lists.fedoraproject.o…
[3] https://selinuxproject.org/page/NetworkStatements#netifcon
Cc: stable(a)vger.kernel.org
Cc: Xinjie Zheng <xinjie(a)google.com>
Reported-by: Sujithra Periasamy <sujithra(a)google.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace(a)redhat.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index e5f1b2757a83..c4931bf6f92a 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -2376,6 +2376,43 @@ err_policy:
return rc;
}
+/**
+ * ocontext_to_sid - Helper to safely get sid for an ocontext
+ * @sidtab: SID table
+ * @c: ocontext structure
+ * @index: index of the context entry (0 or 1)
+ * @out_sid: pointer to the resulting SID value
+ *
+ * For all ocontexts except OCON_ISID the SID fields are populated
+ * on-demand when needed. Since updating the SID value is an SMP-sensitive
+ * operation, this helper must be used to do that safely.
+ *
+ * WARNING: This function may return -ESTALE, indicating that the caller
+ * must retry the operation after re-acquiring the policy pointer!
+ */
+static int ocontext_to_sid(struct sidtab *sidtab, struct ocontext *c,
+ size_t index, u32 *out_sid)
+{
+ int rc;
+ u32 sid;
+
+ /* Ensure the associated sidtab entry is visible to this thread. */
+ sid = smp_load_acquire(&c->sid[index]);
+ if (!sid) {
+ rc = sidtab_context_to_sid(sidtab, &c->context[index], &sid);
+ if (rc)
+ return rc;
+
+ /*
+ * Ensure the new sidtab entry is visible to other threads
+ * when they see the SID.
+ */
+ smp_store_release(&c->sid[index], sid);
+ }
+ *out_sid = sid;
+ return 0;
+}
+
/**
* security_port_sid - Obtain the SID for a port.
* @state: SELinux state
@@ -2414,17 +2451,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else {
*out_sid = SECINITSID_PORT;
}
@@ -2473,18 +2506,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab,
- &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else
*out_sid = SECINITSID_UNLABELED;
@@ -2533,17 +2561,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else
*out_sid = SECINITSID_UNLABELED;
@@ -2587,25 +2611,13 @@ retry:
}
if (c) {
- if (!c->sid[0] || !c->sid[1]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
- rc = sidtab_context_to_sid(sidtab, &c->context[1],
- &c->sid[1]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, if_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *if_sid = c->sid[0];
+ if (rc)
+ goto out;
} else
*if_sid = SECINITSID_NETIF;
@@ -2697,18 +2709,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab,
- &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else {
*out_sid = SECINITSID_NODE;
}
@@ -2873,7 +2880,7 @@ static inline int __security_genfs_sid(struct selinux_policy *policy,
u16 sclass;
struct genfs *genfs;
struct ocontext *c;
- int rc, cmp = 0;
+ int cmp = 0;
while (path[0] == '/' && path[1] == '/')
path++;
@@ -2887,9 +2894,8 @@ static inline int __security_genfs_sid(struct selinux_policy *policy,
break;
}
- rc = -ENOENT;
if (!genfs || cmp)
- goto out;
+ return -ENOENT;
for (c = genfs->head; c; c = c->next) {
len = strlen(c->u.name);
@@ -2898,20 +2904,10 @@ static inline int __security_genfs_sid(struct selinux_policy *policy,
break;
}
- rc = -ENOENT;
if (!c)
- goto out;
-
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0], &c->sid[0]);
- if (rc)
- goto out;
- }
+ return -ENOENT;
- *sid = c->sid[0];
- rc = 0;
-out:
- return rc;
+ return ocontext_to_sid(sidtab, c, 0, sid);
}
/**
@@ -2996,17 +2992,13 @@ retry:
if (c) {
sbsec->behavior = c->v.behavior;
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, &sbsec->sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- sbsec->sid = c->sid[0];
+ if (rc)
+ goto out;
} else {
rc = __security_genfs_sid(policy, fstype, "/",
SECCLASS_DIR, &sbsec->sid);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cbfcd13be5cb2a07868afe67520ed181956579a7 Mon Sep 17 00:00:00 2001
From: Ondrej Mosnacek <omosnace(a)redhat.com>
Date: Wed, 28 Jul 2021 16:03:13 +0200
Subject: [PATCH] selinux: fix race condition when computing ocontext SIDs
Current code contains a lot of racy patterns when converting an
ocontext's context structure to an SID. This is being done in a "lazy"
fashion, such that the SID is looked up in the SID table only when it's
first needed and then cached in the "sid" field of the ocontext
structure. However, this is done without any locking or memory barriers
and is thus unsafe.
Between commits 24ed7fdae669 ("selinux: use separate table for initial
SID lookup") and 66f8e2f03c02 ("selinux: sidtab reverse lookup hash
table"), this race condition lead to an actual observable bug, because a
pointer to the shared sid field was passed directly to
sidtab_context_to_sid(), which was using this location to also store an
intermediate value, which could have been read by other threads and
interpreted as an SID. In practice this caused e.g. new mounts to get a
wrong (seemingly random) filesystem context, leading to strange denials.
This bug has been spotted in the wild at least twice, see [1] and [2].
Fix the race condition by making all the racy functions use a common
helper that ensures the ocontext::sid accesses are made safely using the
appropriate SMP constructs.
Note that security_netif_sid() was populating the sid field of both
contexts stored in the ocontext, but only the first one was actually
used. The SELinux wiki's documentation on the "netifcon" policy
statement [3] suggests that using only the first context is intentional.
I kept only the handling of the first context here, as there is really
no point in doing the SID lookup for the unused one.
I wasn't able to reproduce the bug mentioned above on any kernel that
includes commit 66f8e2f03c02, even though it has been reported that the
issue occurs with that commit, too, just less frequently. Thus, I wasn't
able to verify that this patch fixes the issue, but it makes sense to
avoid the race condition regardless.
[1] https://github.com/containers/container-selinux/issues/89
[2] https://lists.fedoraproject.org/archives/list/selinux@lists.fedoraproject.o…
[3] https://selinuxproject.org/page/NetworkStatements#netifcon
Cc: stable(a)vger.kernel.org
Cc: Xinjie Zheng <xinjie(a)google.com>
Reported-by: Sujithra Periasamy <sujithra(a)google.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek <omosnace(a)redhat.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index e5f1b2757a83..c4931bf6f92a 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -2376,6 +2376,43 @@ err_policy:
return rc;
}
+/**
+ * ocontext_to_sid - Helper to safely get sid for an ocontext
+ * @sidtab: SID table
+ * @c: ocontext structure
+ * @index: index of the context entry (0 or 1)
+ * @out_sid: pointer to the resulting SID value
+ *
+ * For all ocontexts except OCON_ISID the SID fields are populated
+ * on-demand when needed. Since updating the SID value is an SMP-sensitive
+ * operation, this helper must be used to do that safely.
+ *
+ * WARNING: This function may return -ESTALE, indicating that the caller
+ * must retry the operation after re-acquiring the policy pointer!
+ */
+static int ocontext_to_sid(struct sidtab *sidtab, struct ocontext *c,
+ size_t index, u32 *out_sid)
+{
+ int rc;
+ u32 sid;
+
+ /* Ensure the associated sidtab entry is visible to this thread. */
+ sid = smp_load_acquire(&c->sid[index]);
+ if (!sid) {
+ rc = sidtab_context_to_sid(sidtab, &c->context[index], &sid);
+ if (rc)
+ return rc;
+
+ /*
+ * Ensure the new sidtab entry is visible to other threads
+ * when they see the SID.
+ */
+ smp_store_release(&c->sid[index], sid);
+ }
+ *out_sid = sid;
+ return 0;
+}
+
/**
* security_port_sid - Obtain the SID for a port.
* @state: SELinux state
@@ -2414,17 +2451,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else {
*out_sid = SECINITSID_PORT;
}
@@ -2473,18 +2506,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab,
- &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else
*out_sid = SECINITSID_UNLABELED;
@@ -2533,17 +2561,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else
*out_sid = SECINITSID_UNLABELED;
@@ -2587,25 +2611,13 @@ retry:
}
if (c) {
- if (!c->sid[0] || !c->sid[1]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
- rc = sidtab_context_to_sid(sidtab, &c->context[1],
- &c->sid[1]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, if_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *if_sid = c->sid[0];
+ if (rc)
+ goto out;
} else
*if_sid = SECINITSID_NETIF;
@@ -2697,18 +2709,13 @@ retry:
}
if (c) {
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab,
- &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, out_sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- *out_sid = c->sid[0];
+ if (rc)
+ goto out;
} else {
*out_sid = SECINITSID_NODE;
}
@@ -2873,7 +2880,7 @@ static inline int __security_genfs_sid(struct selinux_policy *policy,
u16 sclass;
struct genfs *genfs;
struct ocontext *c;
- int rc, cmp = 0;
+ int cmp = 0;
while (path[0] == '/' && path[1] == '/')
path++;
@@ -2887,9 +2894,8 @@ static inline int __security_genfs_sid(struct selinux_policy *policy,
break;
}
- rc = -ENOENT;
if (!genfs || cmp)
- goto out;
+ return -ENOENT;
for (c = genfs->head; c; c = c->next) {
len = strlen(c->u.name);
@@ -2898,20 +2904,10 @@ static inline int __security_genfs_sid(struct selinux_policy *policy,
break;
}
- rc = -ENOENT;
if (!c)
- goto out;
-
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0], &c->sid[0]);
- if (rc)
- goto out;
- }
+ return -ENOENT;
- *sid = c->sid[0];
- rc = 0;
-out:
- return rc;
+ return ocontext_to_sid(sidtab, c, 0, sid);
}
/**
@@ -2996,17 +2992,13 @@ retry:
if (c) {
sbsec->behavior = c->v.behavior;
- if (!c->sid[0]) {
- rc = sidtab_context_to_sid(sidtab, &c->context[0],
- &c->sid[0]);
- if (rc == -ESTALE) {
- rcu_read_unlock();
- goto retry;
- }
- if (rc)
- goto out;
+ rc = ocontext_to_sid(sidtab, c, 0, &sbsec->sid);
+ if (rc == -ESTALE) {
+ rcu_read_unlock();
+ goto retry;
}
- sbsec->sid = c->sid[0];
+ if (rc)
+ goto out;
} else {
rc = __security_genfs_sid(policy, fstype, "/",
SECCLASS_DIR, &sbsec->sid);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4114978dcd24e72415276bba60ff4ff355970bbc Mon Sep 17 00:00:00 2001
From: Sean Young <sean(a)mess.org>
Date: Tue, 14 Sep 2021 16:57:46 +0200
Subject: [PATCH] media: ir_toy: prevent device from hanging during transmit
If the IR Toy is receiving IR while a transmit is done, it may end up
hanging. We can prevent this from happening by re-entering sample mode
just before issuing the transmit command.
Link: https://github.com/bengtmartensson/HarcHardware/discussions/25
Cc: stable(a)vger.kernel.org
[mchehab: renamed: s/STATE_RESET/STATE_COMMAND_NO_RESP/ ]
Signed-off-by: Sean Young <sean(a)mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/rc/ir_toy.c b/drivers/media/rc/ir_toy.c
index d2d9346eb8f5..71aced52248f 100644
--- a/drivers/media/rc/ir_toy.c
+++ b/drivers/media/rc/ir_toy.c
@@ -26,6 +26,7 @@ static const u8 COMMAND_VERSION[] = { 'v' };
// End transmit and repeat reset command so we exit sump mode
static const u8 COMMAND_RESET[] = { 0xff, 0xff, 0, 0, 0, 0, 0 };
static const u8 COMMAND_SMODE_ENTER[] = { 's' };
+static const u8 COMMAND_SMODE_EXIT[] = { 0 };
static const u8 COMMAND_TXSTART[] = { 0x26, 0x24, 0x25, 0x03 };
#define REPLY_XMITCOUNT 't'
@@ -317,12 +318,30 @@ static int irtoy_tx(struct rc_dev *rc, uint *txbuf, uint count)
buf[i] = cpu_to_be16(v);
}
- buf[count] = cpu_to_be16(0xffff);
+ buf[count] = 0xffff;
irtoy->tx_buf = buf;
irtoy->tx_len = size;
irtoy->emitted = 0;
+ // There is an issue where if the unit is receiving IR while the
+ // first TXSTART command is sent, the device might end up hanging
+ // with its led on. It does not respond to any command when this
+ // happens. To work around this, re-enter sample mode.
+ err = irtoy_command(irtoy, COMMAND_SMODE_EXIT,
+ sizeof(COMMAND_SMODE_EXIT), STATE_COMMAND_NO_RESP);
+ if (err) {
+ dev_err(irtoy->dev, "exit sample mode: %d\n", err);
+ return err;
+ }
+
+ err = irtoy_command(irtoy, COMMAND_SMODE_ENTER,
+ sizeof(COMMAND_SMODE_ENTER), STATE_COMMAND);
+ if (err) {
+ dev_err(irtoy->dev, "enter sample mode: %d\n", err);
+ return err;
+ }
+
err = irtoy_command(irtoy, COMMAND_TXSTART, sizeof(COMMAND_TXSTART),
STATE_TX);
kfree(buf);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4114978dcd24e72415276bba60ff4ff355970bbc Mon Sep 17 00:00:00 2001
From: Sean Young <sean(a)mess.org>
Date: Tue, 14 Sep 2021 16:57:46 +0200
Subject: [PATCH] media: ir_toy: prevent device from hanging during transmit
If the IR Toy is receiving IR while a transmit is done, it may end up
hanging. We can prevent this from happening by re-entering sample mode
just before issuing the transmit command.
Link: https://github.com/bengtmartensson/HarcHardware/discussions/25
Cc: stable(a)vger.kernel.org
[mchehab: renamed: s/STATE_RESET/STATE_COMMAND_NO_RESP/ ]
Signed-off-by: Sean Young <sean(a)mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/rc/ir_toy.c b/drivers/media/rc/ir_toy.c
index d2d9346eb8f5..71aced52248f 100644
--- a/drivers/media/rc/ir_toy.c
+++ b/drivers/media/rc/ir_toy.c
@@ -26,6 +26,7 @@ static const u8 COMMAND_VERSION[] = { 'v' };
// End transmit and repeat reset command so we exit sump mode
static const u8 COMMAND_RESET[] = { 0xff, 0xff, 0, 0, 0, 0, 0 };
static const u8 COMMAND_SMODE_ENTER[] = { 's' };
+static const u8 COMMAND_SMODE_EXIT[] = { 0 };
static const u8 COMMAND_TXSTART[] = { 0x26, 0x24, 0x25, 0x03 };
#define REPLY_XMITCOUNT 't'
@@ -317,12 +318,30 @@ static int irtoy_tx(struct rc_dev *rc, uint *txbuf, uint count)
buf[i] = cpu_to_be16(v);
}
- buf[count] = cpu_to_be16(0xffff);
+ buf[count] = 0xffff;
irtoy->tx_buf = buf;
irtoy->tx_len = size;
irtoy->emitted = 0;
+ // There is an issue where if the unit is receiving IR while the
+ // first TXSTART command is sent, the device might end up hanging
+ // with its led on. It does not respond to any command when this
+ // happens. To work around this, re-enter sample mode.
+ err = irtoy_command(irtoy, COMMAND_SMODE_EXIT,
+ sizeof(COMMAND_SMODE_EXIT), STATE_COMMAND_NO_RESP);
+ if (err) {
+ dev_err(irtoy->dev, "exit sample mode: %d\n", err);
+ return err;
+ }
+
+ err = irtoy_command(irtoy, COMMAND_SMODE_ENTER,
+ sizeof(COMMAND_SMODE_ENTER), STATE_COMMAND);
+ if (err) {
+ dev_err(irtoy->dev, "enter sample mode: %d\n", err);
+ return err;
+ }
+
err = irtoy_command(irtoy, COMMAND_TXSTART, sizeof(COMMAND_TXSTART),
STATE_TX);
kfree(buf);
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4114978dcd24e72415276bba60ff4ff355970bbc Mon Sep 17 00:00:00 2001
From: Sean Young <sean(a)mess.org>
Date: Tue, 14 Sep 2021 16:57:46 +0200
Subject: [PATCH] media: ir_toy: prevent device from hanging during transmit
If the IR Toy is receiving IR while a transmit is done, it may end up
hanging. We can prevent this from happening by re-entering sample mode
just before issuing the transmit command.
Link: https://github.com/bengtmartensson/HarcHardware/discussions/25
Cc: stable(a)vger.kernel.org
[mchehab: renamed: s/STATE_RESET/STATE_COMMAND_NO_RESP/ ]
Signed-off-by: Sean Young <sean(a)mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/rc/ir_toy.c b/drivers/media/rc/ir_toy.c
index d2d9346eb8f5..71aced52248f 100644
--- a/drivers/media/rc/ir_toy.c
+++ b/drivers/media/rc/ir_toy.c
@@ -26,6 +26,7 @@ static const u8 COMMAND_VERSION[] = { 'v' };
// End transmit and repeat reset command so we exit sump mode
static const u8 COMMAND_RESET[] = { 0xff, 0xff, 0, 0, 0, 0, 0 };
static const u8 COMMAND_SMODE_ENTER[] = { 's' };
+static const u8 COMMAND_SMODE_EXIT[] = { 0 };
static const u8 COMMAND_TXSTART[] = { 0x26, 0x24, 0x25, 0x03 };
#define REPLY_XMITCOUNT 't'
@@ -317,12 +318,30 @@ static int irtoy_tx(struct rc_dev *rc, uint *txbuf, uint count)
buf[i] = cpu_to_be16(v);
}
- buf[count] = cpu_to_be16(0xffff);
+ buf[count] = 0xffff;
irtoy->tx_buf = buf;
irtoy->tx_len = size;
irtoy->emitted = 0;
+ // There is an issue where if the unit is receiving IR while the
+ // first TXSTART command is sent, the device might end up hanging
+ // with its led on. It does not respond to any command when this
+ // happens. To work around this, re-enter sample mode.
+ err = irtoy_command(irtoy, COMMAND_SMODE_EXIT,
+ sizeof(COMMAND_SMODE_EXIT), STATE_COMMAND_NO_RESP);
+ if (err) {
+ dev_err(irtoy->dev, "exit sample mode: %d\n", err);
+ return err;
+ }
+
+ err = irtoy_command(irtoy, COMMAND_SMODE_ENTER,
+ sizeof(COMMAND_SMODE_ENTER), STATE_COMMAND);
+ if (err) {
+ dev_err(irtoy->dev, "enter sample mode: %d\n", err);
+ return err;
+ }
+
err = irtoy_command(irtoy, COMMAND_TXSTART, sizeof(COMMAND_TXSTART),
STATE_TX);
kfree(buf);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ec5a4919fa7b7d8c7a2af1c7e799b1fe4be84343 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 8 Oct 2021 17:11:05 -0700
Subject: [PATCH] KVM: VMX: Unregister posted interrupt wakeup handler on
hardware unsetup
Unregister KVM's posted interrupt wakeup handler during unsetup so that a
spurious interrupt that arrives after kvm_intel.ko is unloaded doesn't
call into freed memory.
Fixes: bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20211009001107.3936588-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2f677e72d864..c11688f64e80 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7555,6 +7555,8 @@ static void vmx_migrate_timers(struct kvm_vcpu *vcpu)
static void hardware_unsetup(void)
{
+ kvm_set_posted_intr_wakeup_handler(NULL);
+
if (nested)
nested_vmx_hardware_unsetup();
@@ -7885,8 +7887,6 @@ static __init int hardware_setup(void)
vmx_x86_ops.request_immediate_exit = __kvm_request_immediate_exit;
}
- kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
-
kvm_mce_cap_supported |= MCG_LMCE_P;
if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
@@ -7910,6 +7910,9 @@ static __init int hardware_setup(void)
r = alloc_kvm_area();
if (r)
nested_vmx_hardware_unsetup();
+
+ kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
+
return r;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ec5a4919fa7b7d8c7a2af1c7e799b1fe4be84343 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 8 Oct 2021 17:11:05 -0700
Subject: [PATCH] KVM: VMX: Unregister posted interrupt wakeup handler on
hardware unsetup
Unregister KVM's posted interrupt wakeup handler during unsetup so that a
spurious interrupt that arrives after kvm_intel.ko is unloaded doesn't
call into freed memory.
Fixes: bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20211009001107.3936588-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2f677e72d864..c11688f64e80 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7555,6 +7555,8 @@ static void vmx_migrate_timers(struct kvm_vcpu *vcpu)
static void hardware_unsetup(void)
{
+ kvm_set_posted_intr_wakeup_handler(NULL);
+
if (nested)
nested_vmx_hardware_unsetup();
@@ -7885,8 +7887,6 @@ static __init int hardware_setup(void)
vmx_x86_ops.request_immediate_exit = __kvm_request_immediate_exit;
}
- kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
-
kvm_mce_cap_supported |= MCG_LMCE_P;
if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
@@ -7910,6 +7910,9 @@ static __init int hardware_setup(void)
r = alloc_kvm_area();
if (r)
nested_vmx_hardware_unsetup();
+
+ kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
+
return r;
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ec5a4919fa7b7d8c7a2af1c7e799b1fe4be84343 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 8 Oct 2021 17:11:05 -0700
Subject: [PATCH] KVM: VMX: Unregister posted interrupt wakeup handler on
hardware unsetup
Unregister KVM's posted interrupt wakeup handler during unsetup so that a
spurious interrupt that arrives after kvm_intel.ko is unloaded doesn't
call into freed memory.
Fixes: bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20211009001107.3936588-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2f677e72d864..c11688f64e80 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7555,6 +7555,8 @@ static void vmx_migrate_timers(struct kvm_vcpu *vcpu)
static void hardware_unsetup(void)
{
+ kvm_set_posted_intr_wakeup_handler(NULL);
+
if (nested)
nested_vmx_hardware_unsetup();
@@ -7885,8 +7887,6 @@ static __init int hardware_setup(void)
vmx_x86_ops.request_immediate_exit = __kvm_request_immediate_exit;
}
- kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
-
kvm_mce_cap_supported |= MCG_LMCE_P;
if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
@@ -7910,6 +7910,9 @@ static __init int hardware_setup(void)
r = alloc_kvm_area();
if (r)
nested_vmx_hardware_unsetup();
+
+ kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
+
return r;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ec5a4919fa7b7d8c7a2af1c7e799b1fe4be84343 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 8 Oct 2021 17:11:05 -0700
Subject: [PATCH] KVM: VMX: Unregister posted interrupt wakeup handler on
hardware unsetup
Unregister KVM's posted interrupt wakeup handler during unsetup so that a
spurious interrupt that arrives after kvm_intel.ko is unloaded doesn't
call into freed memory.
Fixes: bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20211009001107.3936588-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2f677e72d864..c11688f64e80 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7555,6 +7555,8 @@ static void vmx_migrate_timers(struct kvm_vcpu *vcpu)
static void hardware_unsetup(void)
{
+ kvm_set_posted_intr_wakeup_handler(NULL);
+
if (nested)
nested_vmx_hardware_unsetup();
@@ -7885,8 +7887,6 @@ static __init int hardware_setup(void)
vmx_x86_ops.request_immediate_exit = __kvm_request_immediate_exit;
}
- kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
-
kvm_mce_cap_supported |= MCG_LMCE_P;
if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
@@ -7910,6 +7910,9 @@ static __init int hardware_setup(void)
r = alloc_kvm_area();
if (r)
nested_vmx_hardware_unsetup();
+
+ kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
+
return r;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ec5a4919fa7b7d8c7a2af1c7e799b1fe4be84343 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 8 Oct 2021 17:11:05 -0700
Subject: [PATCH] KVM: VMX: Unregister posted interrupt wakeup handler on
hardware unsetup
Unregister KVM's posted interrupt wakeup handler during unsetup so that a
spurious interrupt that arrives after kvm_intel.ko is unloaded doesn't
call into freed memory.
Fixes: bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20211009001107.3936588-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2f677e72d864..c11688f64e80 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7555,6 +7555,8 @@ static void vmx_migrate_timers(struct kvm_vcpu *vcpu)
static void hardware_unsetup(void)
{
+ kvm_set_posted_intr_wakeup_handler(NULL);
+
if (nested)
nested_vmx_hardware_unsetup();
@@ -7885,8 +7887,6 @@ static __init int hardware_setup(void)
vmx_x86_ops.request_immediate_exit = __kvm_request_immediate_exit;
}
- kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
-
kvm_mce_cap_supported |= MCG_LMCE_P;
if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST)
@@ -7910,6 +7910,9 @@ static __init int hardware_setup(void)
r = alloc_kvm_area();
if (r)
nested_vmx_hardware_unsetup();
+
+ kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
+
return r;
}
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2bb2e00ed9787e52580bb651264b8d6a2b7a9dd2 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 13 Oct 2021 10:12:49 +0100
Subject: [PATCH] btrfs: fix deadlock between chunk allocation and chunk btree
modifications
When a task is doing some modification to the chunk btree and it is not in
the context of a chunk allocation or a chunk removal, it can deadlock with
another task that is currently allocating a new data or metadata chunk.
These contexts are the following:
* When relocating a system chunk, when we need to COW the extent buffers
that belong to the chunk btree;
* When adding a new device (ioctl), where we need to add a new device item
to the chunk btree;
* When removing a device (ioctl), where we need to remove a device item
from the chunk btree;
* When resizing a device (ioctl), where we need to update a device item in
the chunk btree and may need to relocate a system chunk that lies beyond
the new device size when shrinking a device.
The problem happens due to a sequence of steps like the following:
1) Task A starts a data or metadata chunk allocation and it locks the
chunk mutex;
2) Task B is relocating a system chunk, and when it needs to COW an extent
buffer of the chunk btree, it has locked both that extent buffer as
well as its parent extent buffer;
3) Since there is not enough available system space, either because none
of the existing system block groups have enough free space or because
the only one with enough free space is in RO mode due to the relocation,
task B triggers a new system chunk allocation. It blocks when trying to
acquire the chunk mutex, currently held by task A;
4) Task A enters btrfs_chunk_alloc_add_chunk_item(), in order to insert
the new chunk item into the chunk btree and update the existing device
items there. But in order to do that, it has to lock the extent buffer
that task B locked at step 2, or its parent extent buffer, but task B
is waiting on the chunk mutex, which is currently locked by task A,
therefore resulting in a deadlock.
One example report when the deadlock happens with system chunk relocation:
INFO: task kworker/u9:5:546 blocked for more than 143 seconds.
Not tainted 5.15.0-rc3+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u9:5 state:D stack:25936 pid: 546 ppid: 2 flags:0x00004000
Workqueue: events_unbound btrfs_async_reclaim_metadata_space
Call Trace:
context_switch kernel/sched/core.c:4940 [inline]
__schedule+0xcd9/0x2530 kernel/sched/core.c:6287
schedule+0xd3/0x270 kernel/sched/core.c:6366
rwsem_down_read_slowpath+0x4ee/0x9d0 kernel/locking/rwsem.c:993
__down_read_common kernel/locking/rwsem.c:1214 [inline]
__down_read kernel/locking/rwsem.c:1223 [inline]
down_read_nested+0xe6/0x440 kernel/locking/rwsem.c:1590
__btrfs_tree_read_lock+0x31/0x350 fs/btrfs/locking.c:47
btrfs_tree_read_lock fs/btrfs/locking.c:54 [inline]
btrfs_read_lock_root_node+0x8a/0x320 fs/btrfs/locking.c:191
btrfs_search_slot_get_root fs/btrfs/ctree.c:1623 [inline]
btrfs_search_slot+0x13b4/0x2140 fs/btrfs/ctree.c:1728
btrfs_update_device+0x11f/0x500 fs/btrfs/volumes.c:2794
btrfs_chunk_alloc_add_chunk_item+0x34d/0xea0 fs/btrfs/volumes.c:5504
do_chunk_alloc fs/btrfs/block-group.c:3408 [inline]
btrfs_chunk_alloc+0x84d/0xf50 fs/btrfs/block-group.c:3653
flush_space+0x54e/0xd80 fs/btrfs/space-info.c:670
btrfs_async_reclaim_metadata_space+0x396/0xa90 fs/btrfs/space-info.c:953
process_one_work+0x9df/0x16d0 kernel/workqueue.c:2297
worker_thread+0x90/0xed0 kernel/workqueue.c:2444
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: task syz-executor:9107 blocked for more than 143 seconds.
Not tainted 5.15.0-rc3+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor state:D stack:23200 pid: 9107 ppid: 7792 flags:0x00004004
Call Trace:
context_switch kernel/sched/core.c:4940 [inline]
__schedule+0xcd9/0x2530 kernel/sched/core.c:6287
schedule+0xd3/0x270 kernel/sched/core.c:6366
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425
__mutex_lock_common kernel/locking/mutex.c:669 [inline]
__mutex_lock+0xc96/0x1680 kernel/locking/mutex.c:729
btrfs_chunk_alloc+0x31a/0xf50 fs/btrfs/block-group.c:3631
find_free_extent_update_loop fs/btrfs/extent-tree.c:3986 [inline]
find_free_extent+0x25cb/0x3a30 fs/btrfs/extent-tree.c:4335
btrfs_reserve_extent+0x1f1/0x500 fs/btrfs/extent-tree.c:4415
btrfs_alloc_tree_block+0x203/0x1120 fs/btrfs/extent-tree.c:4813
__btrfs_cow_block+0x412/0x1620 fs/btrfs/ctree.c:415
btrfs_cow_block+0x2f6/0x8c0 fs/btrfs/ctree.c:570
btrfs_search_slot+0x1094/0x2140 fs/btrfs/ctree.c:1768
relocate_tree_block fs/btrfs/relocation.c:2694 [inline]
relocate_tree_blocks+0xf73/0x1770 fs/btrfs/relocation.c:2757
relocate_block_group+0x47e/0xc70 fs/btrfs/relocation.c:3673
btrfs_relocate_block_group+0x48a/0xc60 fs/btrfs/relocation.c:4070
btrfs_relocate_chunk+0x96/0x280 fs/btrfs/volumes.c:3181
__btrfs_balance fs/btrfs/volumes.c:3911 [inline]
btrfs_balance+0x1f03/0x3cd0 fs/btrfs/volumes.c:4301
btrfs_ioctl_balance+0x61e/0x800 fs/btrfs/ioctl.c:4137
btrfs_ioctl+0x39ea/0x7b70 fs/btrfs/ioctl.c:4949
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:874 [inline]
__se_sys_ioctl fs/ioctl.c:860 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
So fix this by making sure that whenever we try to modify the chunk btree
and we are neither in a chunk allocation context nor in a chunk remove
context, we reserve system space before modifying the chunk btree.
Reported-by: Hao Sun <sunhao.th(a)gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CACkBjsax51i4mu6C0C3vJqQN3NR_iVuucoeG3U…
Fixes: 79bd37120b1495 ("btrfs: rework chunk allocation to avoid exhaustion of the system chunk array")
CC: stable(a)vger.kernel.org # 5.14+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index de9aeb3733cf..f971d043469c 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3425,25 +3425,6 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags)
goto out;
}
- /*
- * If this is a system chunk allocation then stop right here and do not
- * add the chunk item to the chunk btree. This is to prevent a deadlock
- * because this system chunk allocation can be triggered while COWing
- * some extent buffer of the chunk btree and while holding a lock on a
- * parent extent buffer, in which case attempting to insert the chunk
- * item (or update the device item) would result in a deadlock on that
- * parent extent buffer. In this case defer the chunk btree updates to
- * the second phase of chunk allocation and keep our reservation until
- * the second phase completes.
- *
- * This is a rare case and can only be triggered by the very few cases
- * we have where we need to touch the chunk btree outside chunk allocation
- * and chunk removal. These cases are basically adding a device, removing
- * a device or resizing a device.
- */
- if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
- return 0;
-
ret = btrfs_chunk_alloc_add_chunk_item(trans, bg);
/*
* Normally we are not expected to fail with -ENOSPC here, since we have
@@ -3576,14 +3557,14 @@ out:
* This has happened before and commit eafa4fd0ad0607 ("btrfs: fix exhaustion of
* the system chunk array due to concurrent allocations") provides more details.
*
- * For allocation of system chunks, we defer the updates and insertions into the
- * chunk btree to phase 2. This is to prevent deadlocks on extent buffers because
- * if the chunk allocation is triggered while COWing an extent buffer of the
- * chunk btree, we are holding a lock on the parent of that extent buffer and
- * doing the chunk btree updates and insertions can require locking that parent.
- * This is for the very few and rare cases where we update the chunk btree that
- * are not chunk allocation or chunk removal: adding a device, removing a device
- * or resizing a device.
+ * Allocation of system chunks does not happen through this function. A task that
+ * needs to update the chunk btree (the only btree that uses system chunks), must
+ * preallocate chunk space by calling either check_system_chunk() or
+ * btrfs_reserve_chunk_metadata() - the former is used when allocating a data or
+ * metadata chunk or when removing a chunk, while the later is used before doing
+ * a modification to the chunk btree - use cases for the later are adding,
+ * removing and resizing a device as well as relocation of a system chunk.
+ * See the comment below for more details.
*
* The reservation of system space, done through check_system_chunk(), as well
* as all the updates and insertions into the chunk btree must be done while
@@ -3620,11 +3601,27 @@ int btrfs_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
if (trans->allocating_chunk)
return -ENOSPC;
/*
- * If we are removing a chunk, don't re-enter or we would deadlock.
- * System space reservation and system chunk allocation is done by the
- * chunk remove operation (btrfs_remove_chunk()).
+ * Allocation of system chunks can not happen through this path, as we
+ * could end up in a deadlock if we are allocating a data or metadata
+ * chunk and there is another task modifying the chunk btree.
+ *
+ * This is because while we are holding the chunk mutex, we will attempt
+ * to add the new chunk item to the chunk btree or update an existing
+ * device item in the chunk btree, while the other task that is modifying
+ * the chunk btree is attempting to COW an extent buffer while holding a
+ * lock on it and on its parent - if the COW operation triggers a system
+ * chunk allocation, then we can deadlock because we are holding the
+ * chunk mutex and we may need to access that extent buffer or its parent
+ * in order to add the chunk item or update a device item.
+ *
+ * Tasks that want to modify the chunk tree should reserve system space
+ * before updating the chunk btree, by calling either
+ * btrfs_reserve_chunk_metadata() or check_system_chunk().
+ * It's possible that after a task reserves the space, it still ends up
+ * here - this happens in the cases described above at do_chunk_alloc().
+ * The task will have to either retry or fail.
*/
- if (trans->removing_chunk)
+ if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
return -ENOSPC;
space_info = btrfs_find_space_info(fs_info, flags);
@@ -3723,17 +3720,14 @@ static u64 get_profile_num_devs(struct btrfs_fs_info *fs_info, u64 type)
return num_dev;
}
-/*
- * Reserve space in the system space for allocating or removing a chunk
- */
-void check_system_chunk(struct btrfs_trans_handle *trans, u64 type)
+static void reserve_chunk_space(struct btrfs_trans_handle *trans,
+ u64 bytes,
+ u64 type)
{
struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_space_info *info;
u64 left;
- u64 thresh;
int ret = 0;
- u64 num_devs;
/*
* Needed because we can end up allocating a system chunk and for an
@@ -3746,19 +3740,13 @@ void check_system_chunk(struct btrfs_trans_handle *trans, u64 type)
left = info->total_bytes - btrfs_space_info_used(info, true);
spin_unlock(&info->lock);
- num_devs = get_profile_num_devs(fs_info, type);
-
- /* num_devs device items to update and 1 chunk item to add or remove */
- thresh = btrfs_calc_metadata_size(fs_info, num_devs) +
- btrfs_calc_insert_metadata_size(fs_info, 1);
-
- if (left < thresh && btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
+ if (left < bytes && btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
btrfs_info(fs_info, "left=%llu, need=%llu, flags=%llu",
- left, thresh, type);
+ left, bytes, type);
btrfs_dump_space_info(fs_info, info, 0, 0);
}
- if (left < thresh) {
+ if (left < bytes) {
u64 flags = btrfs_system_alloc_profile(fs_info);
struct btrfs_block_group *bg;
@@ -3767,21 +3755,20 @@ void check_system_chunk(struct btrfs_trans_handle *trans, u64 type)
* needing it, as we might not need to COW all nodes/leafs from
* the paths we visit in the chunk tree (they were already COWed
* or created in the current transaction for example).
- *
- * Also, if our caller is allocating a system chunk, do not
- * attempt to insert the chunk item in the chunk btree, as we
- * could deadlock on an extent buffer since our caller may be
- * COWing an extent buffer from the chunk btree.
*/
bg = btrfs_create_chunk(trans, flags);
if (IS_ERR(bg)) {
ret = PTR_ERR(bg);
- } else if (!(type & BTRFS_BLOCK_GROUP_SYSTEM)) {
+ } else {
/*
* If we fail to add the chunk item here, we end up
* trying again at phase 2 of chunk allocation, at
* btrfs_create_pending_block_groups(). So ignore
- * any error here.
+ * any error here. An ENOSPC here could happen, due to
+ * the cases described at do_chunk_alloc() - the system
+ * block group we just created was just turned into RO
+ * mode by a scrub for example, or a running discard
+ * temporarily removed its free space entries, etc.
*/
btrfs_chunk_alloc_add_chunk_item(trans, bg);
}
@@ -3790,12 +3777,61 @@ void check_system_chunk(struct btrfs_trans_handle *trans, u64 type)
if (!ret) {
ret = btrfs_block_rsv_add(fs_info->chunk_root,
&fs_info->chunk_block_rsv,
- thresh, BTRFS_RESERVE_NO_FLUSH);
+ bytes, BTRFS_RESERVE_NO_FLUSH);
if (!ret)
- trans->chunk_bytes_reserved += thresh;
+ trans->chunk_bytes_reserved += bytes;
}
}
+/*
+ * Reserve space in the system space for allocating or removing a chunk.
+ * The caller must be holding fs_info->chunk_mutex.
+ */
+void check_system_chunk(struct btrfs_trans_handle *trans, u64 type)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ const u64 num_devs = get_profile_num_devs(fs_info, type);
+ u64 bytes;
+
+ /* num_devs device items to update and 1 chunk item to add or remove. */
+ bytes = btrfs_calc_metadata_size(fs_info, num_devs) +
+ btrfs_calc_insert_metadata_size(fs_info, 1);
+
+ reserve_chunk_space(trans, bytes, type);
+}
+
+/*
+ * Reserve space in the system space, if needed, for doing a modification to the
+ * chunk btree.
+ *
+ * @trans: A transaction handle.
+ * @is_item_insertion: Indicate if the modification is for inserting a new item
+ * in the chunk btree or if it's for the deletion or update
+ * of an existing item.
+ *
+ * This is used in a context where we need to update the chunk btree outside
+ * block group allocation and removal, to avoid a deadlock with a concurrent
+ * task that is allocating a metadata or data block group and therefore needs to
+ * update the chunk btree while holding the chunk mutex. After the update to the
+ * chunk btree is done, btrfs_trans_release_chunk_metadata() should be called.
+ *
+ */
+void btrfs_reserve_chunk_metadata(struct btrfs_trans_handle *trans,
+ bool is_item_insertion)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ u64 bytes;
+
+ if (is_item_insertion)
+ bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
+ else
+ bytes = btrfs_calc_metadata_size(fs_info, 1);
+
+ mutex_lock(&fs_info->chunk_mutex);
+ reserve_chunk_space(trans, bytes, BTRFS_BLOCK_GROUP_SYSTEM);
+ mutex_unlock(&fs_info->chunk_mutex);
+}
+
void btrfs_put_block_group_cache(struct btrfs_fs_info *info)
{
struct btrfs_block_group *block_group;
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index 07f977d3816c..5878b7ce3b78 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -293,6 +293,8 @@ int btrfs_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
enum btrfs_chunk_alloc_enum force);
int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, u64 type);
void check_system_chunk(struct btrfs_trans_handle *trans, const u64 type);
+void btrfs_reserve_chunk_metadata(struct btrfs_trans_handle *trans,
+ bool is_item_insertion);
u64 btrfs_get_alloc_profile(struct btrfs_fs_info *fs_info, u64 orig_flags);
void btrfs_put_block_group_cache(struct btrfs_fs_info *info);
int btrfs_free_block_groups(struct btrfs_fs_info *info);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index fed823596248..33a0ee7ac590 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2692,8 +2692,12 @@ static int relocate_tree_block(struct btrfs_trans_handle *trans,
list_add_tail(&node->list, &rc->backref_cache.changed);
} else {
path->lowest_level = node->level;
+ if (root == root->fs_info->chunk_root)
+ btrfs_reserve_chunk_metadata(trans, false);
ret = btrfs_search_slot(trans, root, key, path, 0, 1);
btrfs_release_path(path);
+ if (root == root->fs_info->chunk_root)
+ btrfs_trans_release_chunk_metadata(trans);
if (ret > 0)
ret = 0;
}
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index debba6f04858..9eab8a741166 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1847,8 +1847,10 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans,
key.type = BTRFS_DEV_ITEM_KEY;
key.offset = device->devid;
+ btrfs_reserve_chunk_metadata(trans, true);
ret = btrfs_insert_empty_item(trans, trans->fs_info->chunk_root, path,
&key, sizeof(*dev_item));
+ btrfs_trans_release_chunk_metadata(trans);
if (ret)
goto out;
@@ -1921,7 +1923,9 @@ static int btrfs_rm_dev_item(struct btrfs_device *device)
key.type = BTRFS_DEV_ITEM_KEY;
key.offset = device->devid;
+ btrfs_reserve_chunk_metadata(trans, false);
ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
+ btrfs_trans_release_chunk_metadata(trans);
if (ret) {
if (ret > 0)
ret = -ENOENT;
@@ -2513,7 +2517,9 @@ static int btrfs_finish_sprout(struct btrfs_trans_handle *trans)
key.type = BTRFS_DEV_ITEM_KEY;
while (1) {
+ btrfs_reserve_chunk_metadata(trans, false);
ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
+ btrfs_trans_release_chunk_metadata(trans);
if (ret < 0)
goto error;
@@ -2862,6 +2868,7 @@ int btrfs_grow_device(struct btrfs_trans_handle *trans,
struct btrfs_super_block *super_copy = fs_info->super_copy;
u64 old_total;
u64 diff;
+ int ret;
if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state))
return -EACCES;
@@ -2890,7 +2897,11 @@ int btrfs_grow_device(struct btrfs_trans_handle *trans,
&trans->transaction->dev_update_list);
mutex_unlock(&fs_info->chunk_mutex);
- return btrfs_update_device(trans, device);
+ btrfs_reserve_chunk_metadata(trans, false);
+ ret = btrfs_update_device(trans, device);
+ btrfs_trans_release_chunk_metadata(trans);
+
+ return ret;
}
static int btrfs_free_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
@@ -4925,8 +4936,10 @@ again:
round_down(old_total - diff, fs_info->sectorsize));
mutex_unlock(&fs_info->chunk_mutex);
+ btrfs_reserve_chunk_metadata(trans, false);
/* Now btrfs_update_device() will change the on-disk size. */
ret = btrfs_update_device(trans, device);
+ btrfs_trans_release_chunk_metadata(trans);
if (ret < 0) {
btrfs_abort_transaction(trans, ret);
btrfs_end_transaction(trans);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2bb2e00ed9787e52580bb651264b8d6a2b7a9dd2 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 13 Oct 2021 10:12:49 +0100
Subject: [PATCH] btrfs: fix deadlock between chunk allocation and chunk btree
modifications
When a task is doing some modification to the chunk btree and it is not in
the context of a chunk allocation or a chunk removal, it can deadlock with
another task that is currently allocating a new data or metadata chunk.
These contexts are the following:
* When relocating a system chunk, when we need to COW the extent buffers
that belong to the chunk btree;
* When adding a new device (ioctl), where we need to add a new device item
to the chunk btree;
* When removing a device (ioctl), where we need to remove a device item
from the chunk btree;
* When resizing a device (ioctl), where we need to update a device item in
the chunk btree and may need to relocate a system chunk that lies beyond
the new device size when shrinking a device.
The problem happens due to a sequence of steps like the following:
1) Task A starts a data or metadata chunk allocation and it locks the
chunk mutex;
2) Task B is relocating a system chunk, and when it needs to COW an extent
buffer of the chunk btree, it has locked both that extent buffer as
well as its parent extent buffer;
3) Since there is not enough available system space, either because none
of the existing system block groups have enough free space or because
the only one with enough free space is in RO mode due to the relocation,
task B triggers a new system chunk allocation. It blocks when trying to
acquire the chunk mutex, currently held by task A;
4) Task A enters btrfs_chunk_alloc_add_chunk_item(), in order to insert
the new chunk item into the chunk btree and update the existing device
items there. But in order to do that, it has to lock the extent buffer
that task B locked at step 2, or its parent extent buffer, but task B
is waiting on the chunk mutex, which is currently locked by task A,
therefore resulting in a deadlock.
One example report when the deadlock happens with system chunk relocation:
INFO: task kworker/u9:5:546 blocked for more than 143 seconds.
Not tainted 5.15.0-rc3+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u9:5 state:D stack:25936 pid: 546 ppid: 2 flags:0x00004000
Workqueue: events_unbound btrfs_async_reclaim_metadata_space
Call Trace:
context_switch kernel/sched/core.c:4940 [inline]
__schedule+0xcd9/0x2530 kernel/sched/core.c:6287
schedule+0xd3/0x270 kernel/sched/core.c:6366
rwsem_down_read_slowpath+0x4ee/0x9d0 kernel/locking/rwsem.c:993
__down_read_common kernel/locking/rwsem.c:1214 [inline]
__down_read kernel/locking/rwsem.c:1223 [inline]
down_read_nested+0xe6/0x440 kernel/locking/rwsem.c:1590
__btrfs_tree_read_lock+0x31/0x350 fs/btrfs/locking.c:47
btrfs_tree_read_lock fs/btrfs/locking.c:54 [inline]
btrfs_read_lock_root_node+0x8a/0x320 fs/btrfs/locking.c:191
btrfs_search_slot_get_root fs/btrfs/ctree.c:1623 [inline]
btrfs_search_slot+0x13b4/0x2140 fs/btrfs/ctree.c:1728
btrfs_update_device+0x11f/0x500 fs/btrfs/volumes.c:2794
btrfs_chunk_alloc_add_chunk_item+0x34d/0xea0 fs/btrfs/volumes.c:5504
do_chunk_alloc fs/btrfs/block-group.c:3408 [inline]
btrfs_chunk_alloc+0x84d/0xf50 fs/btrfs/block-group.c:3653
flush_space+0x54e/0xd80 fs/btrfs/space-info.c:670
btrfs_async_reclaim_metadata_space+0x396/0xa90 fs/btrfs/space-info.c:953
process_one_work+0x9df/0x16d0 kernel/workqueue.c:2297
worker_thread+0x90/0xed0 kernel/workqueue.c:2444
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: task syz-executor:9107 blocked for more than 143 seconds.
Not tainted 5.15.0-rc3+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor state:D stack:23200 pid: 9107 ppid: 7792 flags:0x00004004
Call Trace:
context_switch kernel/sched/core.c:4940 [inline]
__schedule+0xcd9/0x2530 kernel/sched/core.c:6287
schedule+0xd3/0x270 kernel/sched/core.c:6366
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425
__mutex_lock_common kernel/locking/mutex.c:669 [inline]
__mutex_lock+0xc96/0x1680 kernel/locking/mutex.c:729
btrfs_chunk_alloc+0x31a/0xf50 fs/btrfs/block-group.c:3631
find_free_extent_update_loop fs/btrfs/extent-tree.c:3986 [inline]
find_free_extent+0x25cb/0x3a30 fs/btrfs/extent-tree.c:4335
btrfs_reserve_extent+0x1f1/0x500 fs/btrfs/extent-tree.c:4415
btrfs_alloc_tree_block+0x203/0x1120 fs/btrfs/extent-tree.c:4813
__btrfs_cow_block+0x412/0x1620 fs/btrfs/ctree.c:415
btrfs_cow_block+0x2f6/0x8c0 fs/btrfs/ctree.c:570
btrfs_search_slot+0x1094/0x2140 fs/btrfs/ctree.c:1768
relocate_tree_block fs/btrfs/relocation.c:2694 [inline]
relocate_tree_blocks+0xf73/0x1770 fs/btrfs/relocation.c:2757
relocate_block_group+0x47e/0xc70 fs/btrfs/relocation.c:3673
btrfs_relocate_block_group+0x48a/0xc60 fs/btrfs/relocation.c:4070
btrfs_relocate_chunk+0x96/0x280 fs/btrfs/volumes.c:3181
__btrfs_balance fs/btrfs/volumes.c:3911 [inline]
btrfs_balance+0x1f03/0x3cd0 fs/btrfs/volumes.c:4301
btrfs_ioctl_balance+0x61e/0x800 fs/btrfs/ioctl.c:4137
btrfs_ioctl+0x39ea/0x7b70 fs/btrfs/ioctl.c:4949
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:874 [inline]
__se_sys_ioctl fs/ioctl.c:860 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
So fix this by making sure that whenever we try to modify the chunk btree
and we are neither in a chunk allocation context nor in a chunk remove
context, we reserve system space before modifying the chunk btree.
Reported-by: Hao Sun <sunhao.th(a)gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CACkBjsax51i4mu6C0C3vJqQN3NR_iVuucoeG3U…
Fixes: 79bd37120b1495 ("btrfs: rework chunk allocation to avoid exhaustion of the system chunk array")
CC: stable(a)vger.kernel.org # 5.14+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index de9aeb3733cf..f971d043469c 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3425,25 +3425,6 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags)
goto out;
}
- /*
- * If this is a system chunk allocation then stop right here and do not
- * add the chunk item to the chunk btree. This is to prevent a deadlock
- * because this system chunk allocation can be triggered while COWing
- * some extent buffer of the chunk btree and while holding a lock on a
- * parent extent buffer, in which case attempting to insert the chunk
- * item (or update the device item) would result in a deadlock on that
- * parent extent buffer. In this case defer the chunk btree updates to
- * the second phase of chunk allocation and keep our reservation until
- * the second phase completes.
- *
- * This is a rare case and can only be triggered by the very few cases
- * we have where we need to touch the chunk btree outside chunk allocation
- * and chunk removal. These cases are basically adding a device, removing
- * a device or resizing a device.
- */
- if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
- return 0;
-
ret = btrfs_chunk_alloc_add_chunk_item(trans, bg);
/*
* Normally we are not expected to fail with -ENOSPC here, since we have
@@ -3576,14 +3557,14 @@ out:
* This has happened before and commit eafa4fd0ad0607 ("btrfs: fix exhaustion of
* the system chunk array due to concurrent allocations") provides more details.
*
- * For allocation of system chunks, we defer the updates and insertions into the
- * chunk btree to phase 2. This is to prevent deadlocks on extent buffers because
- * if the chunk allocation is triggered while COWing an extent buffer of the
- * chunk btree, we are holding a lock on the parent of that extent buffer and
- * doing the chunk btree updates and insertions can require locking that parent.
- * This is for the very few and rare cases where we update the chunk btree that
- * are not chunk allocation or chunk removal: adding a device, removing a device
- * or resizing a device.
+ * Allocation of system chunks does not happen through this function. A task that
+ * needs to update the chunk btree (the only btree that uses system chunks), must
+ * preallocate chunk space by calling either check_system_chunk() or
+ * btrfs_reserve_chunk_metadata() - the former is used when allocating a data or
+ * metadata chunk or when removing a chunk, while the later is used before doing
+ * a modification to the chunk btree - use cases for the later are adding,
+ * removing and resizing a device as well as relocation of a system chunk.
+ * See the comment below for more details.
*
* The reservation of system space, done through check_system_chunk(), as well
* as all the updates and insertions into the chunk btree must be done while
@@ -3620,11 +3601,27 @@ int btrfs_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
if (trans->allocating_chunk)
return -ENOSPC;
/*
- * If we are removing a chunk, don't re-enter or we would deadlock.
- * System space reservation and system chunk allocation is done by the
- * chunk remove operation (btrfs_remove_chunk()).
+ * Allocation of system chunks can not happen through this path, as we
+ * could end up in a deadlock if we are allocating a data or metadata
+ * chunk and there is another task modifying the chunk btree.
+ *
+ * This is because while we are holding the chunk mutex, we will attempt
+ * to add the new chunk item to the chunk btree or update an existing
+ * device item in the chunk btree, while the other task that is modifying
+ * the chunk btree is attempting to COW an extent buffer while holding a
+ * lock on it and on its parent - if the COW operation triggers a system
+ * chunk allocation, then we can deadlock because we are holding the
+ * chunk mutex and we may need to access that extent buffer or its parent
+ * in order to add the chunk item or update a device item.
+ *
+ * Tasks that want to modify the chunk tree should reserve system space
+ * before updating the chunk btree, by calling either
+ * btrfs_reserve_chunk_metadata() or check_system_chunk().
+ * It's possible that after a task reserves the space, it still ends up
+ * here - this happens in the cases described above at do_chunk_alloc().
+ * The task will have to either retry or fail.
*/
- if (trans->removing_chunk)
+ if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
return -ENOSPC;
space_info = btrfs_find_space_info(fs_info, flags);
@@ -3723,17 +3720,14 @@ static u64 get_profile_num_devs(struct btrfs_fs_info *fs_info, u64 type)
return num_dev;
}
-/*
- * Reserve space in the system space for allocating or removing a chunk
- */
-void check_system_chunk(struct btrfs_trans_handle *trans, u64 type)
+static void reserve_chunk_space(struct btrfs_trans_handle *trans,
+ u64 bytes,
+ u64 type)
{
struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_space_info *info;
u64 left;
- u64 thresh;
int ret = 0;
- u64 num_devs;
/*
* Needed because we can end up allocating a system chunk and for an
@@ -3746,19 +3740,13 @@ void check_system_chunk(struct btrfs_trans_handle *trans, u64 type)
left = info->total_bytes - btrfs_space_info_used(info, true);
spin_unlock(&info->lock);
- num_devs = get_profile_num_devs(fs_info, type);
-
- /* num_devs device items to update and 1 chunk item to add or remove */
- thresh = btrfs_calc_metadata_size(fs_info, num_devs) +
- btrfs_calc_insert_metadata_size(fs_info, 1);
-
- if (left < thresh && btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
+ if (left < bytes && btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
btrfs_info(fs_info, "left=%llu, need=%llu, flags=%llu",
- left, thresh, type);
+ left, bytes, type);
btrfs_dump_space_info(fs_info, info, 0, 0);
}
- if (left < thresh) {
+ if (left < bytes) {
u64 flags = btrfs_system_alloc_profile(fs_info);
struct btrfs_block_group *bg;
@@ -3767,21 +3755,20 @@ void check_system_chunk(struct btrfs_trans_handle *trans, u64 type)
* needing it, as we might not need to COW all nodes/leafs from
* the paths we visit in the chunk tree (they were already COWed
* or created in the current transaction for example).
- *
- * Also, if our caller is allocating a system chunk, do not
- * attempt to insert the chunk item in the chunk btree, as we
- * could deadlock on an extent buffer since our caller may be
- * COWing an extent buffer from the chunk btree.
*/
bg = btrfs_create_chunk(trans, flags);
if (IS_ERR(bg)) {
ret = PTR_ERR(bg);
- } else if (!(type & BTRFS_BLOCK_GROUP_SYSTEM)) {
+ } else {
/*
* If we fail to add the chunk item here, we end up
* trying again at phase 2 of chunk allocation, at
* btrfs_create_pending_block_groups(). So ignore
- * any error here.
+ * any error here. An ENOSPC here could happen, due to
+ * the cases described at do_chunk_alloc() - the system
+ * block group we just created was just turned into RO
+ * mode by a scrub for example, or a running discard
+ * temporarily removed its free space entries, etc.
*/
btrfs_chunk_alloc_add_chunk_item(trans, bg);
}
@@ -3790,12 +3777,61 @@ void check_system_chunk(struct btrfs_trans_handle *trans, u64 type)
if (!ret) {
ret = btrfs_block_rsv_add(fs_info->chunk_root,
&fs_info->chunk_block_rsv,
- thresh, BTRFS_RESERVE_NO_FLUSH);
+ bytes, BTRFS_RESERVE_NO_FLUSH);
if (!ret)
- trans->chunk_bytes_reserved += thresh;
+ trans->chunk_bytes_reserved += bytes;
}
}
+/*
+ * Reserve space in the system space for allocating or removing a chunk.
+ * The caller must be holding fs_info->chunk_mutex.
+ */
+void check_system_chunk(struct btrfs_trans_handle *trans, u64 type)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ const u64 num_devs = get_profile_num_devs(fs_info, type);
+ u64 bytes;
+
+ /* num_devs device items to update and 1 chunk item to add or remove. */
+ bytes = btrfs_calc_metadata_size(fs_info, num_devs) +
+ btrfs_calc_insert_metadata_size(fs_info, 1);
+
+ reserve_chunk_space(trans, bytes, type);
+}
+
+/*
+ * Reserve space in the system space, if needed, for doing a modification to the
+ * chunk btree.
+ *
+ * @trans: A transaction handle.
+ * @is_item_insertion: Indicate if the modification is for inserting a new item
+ * in the chunk btree or if it's for the deletion or update
+ * of an existing item.
+ *
+ * This is used in a context where we need to update the chunk btree outside
+ * block group allocation and removal, to avoid a deadlock with a concurrent
+ * task that is allocating a metadata or data block group and therefore needs to
+ * update the chunk btree while holding the chunk mutex. After the update to the
+ * chunk btree is done, btrfs_trans_release_chunk_metadata() should be called.
+ *
+ */
+void btrfs_reserve_chunk_metadata(struct btrfs_trans_handle *trans,
+ bool is_item_insertion)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ u64 bytes;
+
+ if (is_item_insertion)
+ bytes = btrfs_calc_insert_metadata_size(fs_info, 1);
+ else
+ bytes = btrfs_calc_metadata_size(fs_info, 1);
+
+ mutex_lock(&fs_info->chunk_mutex);
+ reserve_chunk_space(trans, bytes, BTRFS_BLOCK_GROUP_SYSTEM);
+ mutex_unlock(&fs_info->chunk_mutex);
+}
+
void btrfs_put_block_group_cache(struct btrfs_fs_info *info)
{
struct btrfs_block_group *block_group;
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index 07f977d3816c..5878b7ce3b78 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -293,6 +293,8 @@ int btrfs_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
enum btrfs_chunk_alloc_enum force);
int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, u64 type);
void check_system_chunk(struct btrfs_trans_handle *trans, const u64 type);
+void btrfs_reserve_chunk_metadata(struct btrfs_trans_handle *trans,
+ bool is_item_insertion);
u64 btrfs_get_alloc_profile(struct btrfs_fs_info *fs_info, u64 orig_flags);
void btrfs_put_block_group_cache(struct btrfs_fs_info *info);
int btrfs_free_block_groups(struct btrfs_fs_info *info);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index fed823596248..33a0ee7ac590 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2692,8 +2692,12 @@ static int relocate_tree_block(struct btrfs_trans_handle *trans,
list_add_tail(&node->list, &rc->backref_cache.changed);
} else {
path->lowest_level = node->level;
+ if (root == root->fs_info->chunk_root)
+ btrfs_reserve_chunk_metadata(trans, false);
ret = btrfs_search_slot(trans, root, key, path, 0, 1);
btrfs_release_path(path);
+ if (root == root->fs_info->chunk_root)
+ btrfs_trans_release_chunk_metadata(trans);
if (ret > 0)
ret = 0;
}
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index debba6f04858..9eab8a741166 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1847,8 +1847,10 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans,
key.type = BTRFS_DEV_ITEM_KEY;
key.offset = device->devid;
+ btrfs_reserve_chunk_metadata(trans, true);
ret = btrfs_insert_empty_item(trans, trans->fs_info->chunk_root, path,
&key, sizeof(*dev_item));
+ btrfs_trans_release_chunk_metadata(trans);
if (ret)
goto out;
@@ -1921,7 +1923,9 @@ static int btrfs_rm_dev_item(struct btrfs_device *device)
key.type = BTRFS_DEV_ITEM_KEY;
key.offset = device->devid;
+ btrfs_reserve_chunk_metadata(trans, false);
ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
+ btrfs_trans_release_chunk_metadata(trans);
if (ret) {
if (ret > 0)
ret = -ENOENT;
@@ -2513,7 +2517,9 @@ static int btrfs_finish_sprout(struct btrfs_trans_handle *trans)
key.type = BTRFS_DEV_ITEM_KEY;
while (1) {
+ btrfs_reserve_chunk_metadata(trans, false);
ret = btrfs_search_slot(trans, root, &key, path, 0, 1);
+ btrfs_trans_release_chunk_metadata(trans);
if (ret < 0)
goto error;
@@ -2862,6 +2868,7 @@ int btrfs_grow_device(struct btrfs_trans_handle *trans,
struct btrfs_super_block *super_copy = fs_info->super_copy;
u64 old_total;
u64 diff;
+ int ret;
if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state))
return -EACCES;
@@ -2890,7 +2897,11 @@ int btrfs_grow_device(struct btrfs_trans_handle *trans,
&trans->transaction->dev_update_list);
mutex_unlock(&fs_info->chunk_mutex);
- return btrfs_update_device(trans, device);
+ btrfs_reserve_chunk_metadata(trans, false);
+ ret = btrfs_update_device(trans, device);
+ btrfs_trans_release_chunk_metadata(trans);
+
+ return ret;
}
static int btrfs_free_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
@@ -4925,8 +4936,10 @@ again:
round_down(old_total - diff, fs_info->sectorsize));
mutex_unlock(&fs_info->chunk_mutex);
+ btrfs_reserve_chunk_metadata(trans, false);
/* Now btrfs_update_device() will change the on-disk size. */
ret = btrfs_update_device(trans, device);
+ btrfs_trans_release_chunk_metadata(trans);
if (ret < 0) {
btrfs_abort_transaction(trans, ret);
btrfs_end_transaction(trans);
> commit 8615ff6dd1ac9e01b6fcf0fc0652353f79f524ed
> Author: Yang Shi <shy828301(a)gmail.com>
> Date: Thu Oct 28 14:36:11 2021 -0700
>
> mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
>
> commit eac96c3efdb593df1a57bb5b95dbe037bfa9a522 upstream.
For the sake of testing,
other than this breaking systemd-journal,
postgresql is another service that would hang forever with 100% CPU,
on arm64 (odroid-c4) using Ubuntu 20.04.
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b968e84b509da593c50dc3db679e1d33de701f78 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Fri, 17 Sep 2021 11:20:04 +0200
Subject: [PATCH] x86/iopl: Fake iopl(3) CLI/STI usage
Since commit c8137ace5638 ("x86/iopl: Restrict iopl() permission
scope") it's possible to emulate iopl(3) using ioperm(), except for
the CLI/STI usage.
Userspace CLI/STI usage is very dubious (read broken), since any
exception taken during that window can lead to rescheduling anyway (or
worse). The IOPL(2) manpage even states that usage of CLI/STI is highly
discouraged and might even crash the system.
Of course, that won't stop people and HP has the dubious honour of
being the first vendor to be found using this in their hp-health
package.
In order to enable this 'software' to still 'work', have the #GP treat
the CLI/STI instructions as NOPs when iopl(3). Warn the user that
their program is doing dubious things.
Fixes: a24ca9976843 ("x86/iopl: Remove legacy IOPL option")
Reported-by: Ondrej Zary <linux(a)zary.sk>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)kernel.org # v5.5+
Link: https://lkml.kernel.org/r/20210918090641.GD5106@worktop.programming.kicks-a…
diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 91d7182ad2d6..4ec3613551e3 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -21,6 +21,7 @@ int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
int insn_get_modrm_reg_off(struct insn *insn, struct pt_regs *regs);
unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx);
int insn_get_code_seg_params(struct pt_regs *regs);
+int insn_get_effective_ip(struct pt_regs *regs, unsigned long *ip);
int insn_fetch_from_user(struct pt_regs *regs,
unsigned char buf[MAX_INSN_SIZE]);
int insn_fetch_from_user_inatomic(struct pt_regs *regs,
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9ad2acaaae9b..577f342dbfb2 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -518,6 +518,7 @@ struct thread_struct {
*/
unsigned long iopl_emul;
+ unsigned int iopl_warn:1;
unsigned int sig_on_uaccess_err:1;
/*
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 1d9463e3096b..f2f733bcb2b9 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -132,6 +132,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
frame->ret_addr = (unsigned long) ret_from_fork;
p->thread.sp = (unsigned long) fork_frame;
p->thread.io_bitmap = NULL;
+ p->thread.iopl_warn = 0;
memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
#ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a58800973aed..f3f3034b06f3 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -528,6 +528,36 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
#define GPFSTR "general protection fault"
+static bool fixup_iopl_exception(struct pt_regs *regs)
+{
+ struct thread_struct *t = ¤t->thread;
+ unsigned char byte;
+ unsigned long ip;
+
+ if (!IS_ENABLED(CONFIG_X86_IOPL_IOPERM) || t->iopl_emul != 3)
+ return false;
+
+ if (insn_get_effective_ip(regs, &ip))
+ return false;
+
+ if (get_user(byte, (const char __user *)ip))
+ return false;
+
+ if (byte != 0xfa && byte != 0xfb)
+ return false;
+
+ if (!t->iopl_warn && printk_ratelimit()) {
+ pr_err("%s[%d] attempts to use CLI/STI, pretending it's a NOP, ip:%lx",
+ current->comm, task_pid_nr(current), ip);
+ print_vma_addr(KERN_CONT " in ", ip);
+ pr_cont("\n");
+ t->iopl_warn = 1;
+ }
+
+ regs->ip += 1;
+ return true;
+}
+
DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
{
char desc[sizeof(GPFSTR) + 50 + 2*sizeof(unsigned long) + 1] = GPFSTR;
@@ -553,6 +583,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
tsk = current;
if (user_mode(regs)) {
+ if (fixup_iopl_exception(regs))
+ goto exit;
+
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = X86_TRAP_GP;
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index a1d24fdc07cf..eb3ccffb9b9d 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1417,7 +1417,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
}
}
-static int insn_get_effective_ip(struct pt_regs *regs, unsigned long *ip)
+int insn_get_effective_ip(struct pt_regs *regs, unsigned long *ip)
{
unsigned long seg_base = 0;
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 432cce21b66c49b1951d046d3ffc5d52fdad6265 Mon Sep 17 00:00:00 2001
From: Sachi King <nakato(a)nakato.io>
Date: Sat, 2 Oct 2021 14:18:39 +1000
Subject: [PATCH] platform/x86: amd-pmc: Add alternative acpi id for PMC
controller
The Surface Laptop 4 AMD has used the AMD0005 to identify this
controller instead of using the appropriate ACPI ID AMDI0005. Include
AMD0005 in the acpi id list.
Link: https://github.com/linux-surface/acpidumps/tree/master/surface_laptop_4_amd
Link: https://gist.github.com/nakato/2a1a7df1a45fe680d7a08c583e1bf863
Cc: <stable(a)vger.kernel.org> # 5.14+
Signed-off-by: Sachi King <nakato(a)nakato.io>
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Link: https://lore.kernel.org/r/20211002041840.2058647-1-nakato@nakato.io
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
diff --git a/drivers/platform/x86/amd-pmc.c b/drivers/platform/x86/amd-pmc.c
index 548432a2ea65..b7b6ad4629a7 100644
--- a/drivers/platform/x86/amd-pmc.c
+++ b/drivers/platform/x86/amd-pmc.c
@@ -555,6 +555,7 @@ static const struct acpi_device_id amd_pmc_acpi_ids[] = {
{"AMDI0006", 0},
{"AMDI0007", 0},
{"AMD0004", 0},
+ {"AMD0005", 0},
{ }
};
MODULE_DEVICE_TABLE(acpi, amd_pmc_acpi_ids);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 432cce21b66c49b1951d046d3ffc5d52fdad6265 Mon Sep 17 00:00:00 2001
From: Sachi King <nakato(a)nakato.io>
Date: Sat, 2 Oct 2021 14:18:39 +1000
Subject: [PATCH] platform/x86: amd-pmc: Add alternative acpi id for PMC
controller
The Surface Laptop 4 AMD has used the AMD0005 to identify this
controller instead of using the appropriate ACPI ID AMDI0005. Include
AMD0005 in the acpi id list.
Link: https://github.com/linux-surface/acpidumps/tree/master/surface_laptop_4_amd
Link: https://gist.github.com/nakato/2a1a7df1a45fe680d7a08c583e1bf863
Cc: <stable(a)vger.kernel.org> # 5.14+
Signed-off-by: Sachi King <nakato(a)nakato.io>
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Link: https://lore.kernel.org/r/20211002041840.2058647-1-nakato@nakato.io
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
diff --git a/drivers/platform/x86/amd-pmc.c b/drivers/platform/x86/amd-pmc.c
index 548432a2ea65..b7b6ad4629a7 100644
--- a/drivers/platform/x86/amd-pmc.c
+++ b/drivers/platform/x86/amd-pmc.c
@@ -555,6 +555,7 @@ static const struct acpi_device_id amd_pmc_acpi_ids[] = {
{"AMDI0006", 0},
{"AMDI0007", 0},
{"AMD0004", 0},
+ {"AMD0005", 0},
{ }
};
MODULE_DEVICE_TABLE(acpi, amd_pmc_acpi_ids);