The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: ce0d998be9274dd3a3d971cbeaa6fe28fd2c3062
Gitweb: https://git.kernel.org/tip/ce0d998be9274dd3a3d971cbeaa6fe28fd2c3062
Author: Adrian Hunter <adrian.hunter(a)intel.com>
AuthorDate: Sat, 12 Nov 2022 17:15:08 +02:00
Committer: Peter Zijlstra <peterz(a)infradead.org>
CommitterDate: Wed, 16 Nov 2022 10:12:59 +01:00
perf/x86/intel/pt: Fix sampling using single range output
Deal with errata TGL052, ADL037 and RPL017 "Trace May Contain Incorrect
Data When Configured With Single Range Output Larger Than 4KB" by
disabling single range output whenever larger than 4KB.
Fixes: 670638477aed ("perf/x86/intel/pt: Opportunistically use single range output mode")
Signed-off-by: Adrian Hunter <adrian.hunter(a)intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20221112151508.13768-1-adrian.hunter@intel.com
---
arch/x86/events/intel/pt.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 82ef87e..42a5579 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1263,6 +1263,15 @@ static int pt_buffer_try_single(struct pt_buffer *buf, int nr_pages)
if (1 << order != nr_pages)
goto out;
+ /*
+ * Some processors cannot always support single range for more than
+ * 4KB - refer errata TGL052, ADL037 and RPL017. Future processors might
+ * also be affected, so for now rather than trying to keep track of
+ * which ones, just disable it for all.
+ */
+ if (nr_pages > 1)
+ goto out;
+
buf->single = true;
buf->nr_pages = nr_pages;
ret = 0;
This patch fixes the VRAM BO eviction issue during resume when
playing the steam game cuphead.
During psp resume, it requests a VRAM buffer of size 10240 KiB for
the trusted memory region, as part of this memory allocation we are
trying to evict few user buffers from VRAM to SYSTEM domain, the
eviction process fails as the selected resource doesn't have contiguous
blocks. Hence, the TMR memory request fails and the system stuck at
resume process.
This change will skip the resource which has non-contiguous blocks and
goes to the next available resource until it finds the contiguous blocks
resource and moves the resource from VRAM to SYSTEM domain and proceed
for the successful TMR allocation in VRAM and thus system comes out of
resume process.
Gitlab issue link: https://gitlab.freedesktop.org/drm/amd/-/issues/2213
v2: Added issue link and fixes tag.
Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam(a)amd.com>
Cc: stable(a)vger.kernel.org #6.0
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index aea8d26b1724..1964de6ac997 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1369,6 +1369,10 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
amdgpu_bo_encrypted(ttm_to_amdgpu_bo(bo)))
return false;
+ if (bo->resource->mem_type == TTM_PL_VRAM &&
+ !(bo->resource->placement & TTM_PL_FLAG_CONTIGUOUS))
+ return false;
+
return ttm_bo_eviction_valuable(bo, place);
}
--
2.25.1
From: Wyes Karny <wyes.karny(a)amd.com>
MSR_AMD_PERF_CTL register should be reseted while initializing the
amd_pstate driver. On a running system, if we switch the cpufreq driver
from acpi_cpufreq to amd_pstate, we see the following issue: There is no
frequency change on cores which were previously running at a non-zero
pstate. Hence, reset the pstate to zero on all the cores during the
initialization of the amd_pstate driver to avoid performance drop.
While the AMD_PERF_CTL MSR is guaranteed to be set to 0 on a cold boot,
the same is not true for a kexec boot, which is a very common mode of
switching kernels/reboot in MDCs and Ubuntu.
Signed-off-by: Wyes Karny <wyes.karny(a)amd.com>
Signed-off-by: Perry Yuan <Perry.Yuan(a)amd.com>
Cc: stable(a)vger.kernel.org
---
drivers/cpufreq/amd-pstate.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 1ffb97b6dbe2..0057ad5dfa97 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -420,12 +420,18 @@ static void amd_pstate_boost_init(struct amd_cpudata *cpudata)
amd_pstate_driver.boost_enabled = true;
}
+static void amd_perf_ctl_reset(unsigned int cpu)
+{
+ wrmsrl_on_cpu(cpu, MSR_AMD_PERF_CTL, 0);
+}
+
static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
{
int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
struct device *dev;
struct amd_cpudata *cpudata;
+ amd_perf_ctl_reset(policy->cpu);
dev = get_cpu_device(policy->cpu);
if (!dev)
return -ENODEV;
--
2.25.1
The hardware XRSTOR instruction resets the PKRU register to its hardware
init value (namely 0) if the PKRU bit is not set in the xfeatures mask.
Emulating that here restores the pre-5.14 behavior for PTRACE_SET_REGSET
with NT_X86_XSTATE, and makes sigreturn (which still uses XRSTOR) and
ptrace behave identically. KVM has never used XRSTOR and never had this
behavior, so KVM opts-out of this emulation by passing a NULL pkru pointer
to copy_uabi_to_xstate().
Fixes: e84ba47e313d ("x86/fpu: Hook up PKRU into ptrace()")
Signed-off-by: Kyle Huey <me(a)kylehuey.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: stable(a)vger.kernel.org # 5.14+
---
arch/x86/kernel/fpu/core.c | 8 ++++++++
arch/x86/kernel/fpu/xstate.c | 15 ++++++++++++++-
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 46b935bc87c8..8d0f6019c21d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -404,6 +404,14 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
if (ustate->xsave.header.xfeatures & ~xcr0)
return -EINVAL;
+ /*
+ * Nullify @vpkru to preserve its current value if PKRU's bit isn't set
+ * in the header. KVM's odd ABI is to leave PKRU untouched in this
+ * case (all other components are eventually re-initialized).
+ */
+ if (!(ustate->xsave.header.xfeatures & XFEATURE_MASK_PKRU))
+ vpkru = NULL;
+
return copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru);
}
EXPORT_SYMBOL_GPL(fpu_copy_uabi_to_guest_fpstate);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index bebc30c29ed3..193c6e95daa8 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1219,8 +1219,14 @@ static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size,
* it is harmless.
* 2. When called from ptrace the PKRU register will be restored from the
* thread_struct's pkru field. A pointer to that is passed in @pkru.
+ * The kernel will restore it manually, so the XRSTOR behavior that resets
+ * the PKRU register to the hardware init value (0) if the corresponding
+ * xfeatures bit is not set is emulated here.
* 3. When called from KVM the PKRU register will be restored from the vcpu's
- * pkru field. A pointer to that is passed in @pkru.
+ * pkru field. A pointer to that is passed in @pkru. KVM hasn't used
+ * XRSTOR and hasn't had the PKRU resetting behavior described above. To
+ * preserve that KVM behavior, it passes NULL for @pkru if the xfeatures
+ * bit is not set.
*/
static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
const void __user *ubuf, u32 *pkru)
@@ -1277,6 +1283,13 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
xpkru = __raw_xsave_addr(xsave, XFEATURE_PKRU);
*pkru = xpkru->pkru;
+ } else {
+ /*
+ * KVM may pass NULL here to indicate that it does not need
+ * PKRU updated.
+ */
+ if (pkru)
+ *pkru = 0;
}
/*
--
2.38.1
Move KVM's PKRU handling code in fpu_copy_uabi_to_guest_fpstate() to
copy_uabi_to_xstate() so that it is shared with other APIs that write the
XSTATE such as PTRACE_SETREGSET with NT_X86_XSTATE.
This restores the pre-5.14 behavior of ptrace. The regression can be seen
by running gdb and executing `p $pkru`, `set $pkru = 42`, and `p $pkru`.
On affected kernels (5.14+) the write to the PKRU register (which gdb
performs through ptrace) is ignored.
Fixes: e84ba47e313d ("x86/fpu: Hook up PKRU into ptrace()")
Signed-off-by: Kyle Huey <me(a)kylehuey.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: stable(a)vger.kernel.org # 5.14+
---
arch/x86/kernel/fpu/core.c | 13 +------------
arch/x86/kernel/fpu/xstate.c | 21 ++++++++++++++++++++-
2 files changed, 21 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 550157686323..46b935bc87c8 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -391,8 +391,6 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
{
struct fpstate *kstate = gfpu->fpstate;
const union fpregs_state *ustate = buf;
- struct pkru_state *xpkru;
- int ret;
if (!cpu_feature_enabled(X86_FEATURE_XSAVE)) {
if (ustate->xsave.header.xfeatures & ~XFEATURE_MASK_FPSSE)
@@ -406,16 +404,7 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
if (ustate->xsave.header.xfeatures & ~xcr0)
return -EINVAL;
- ret = copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru);
- if (ret)
- return ret;
-
- /* Retrieve PKRU if not in init state */
- if (kstate->regs.xsave.header.xfeatures & XFEATURE_MASK_PKRU) {
- xpkru = get_xsave_addr(&kstate->regs.xsave, XFEATURE_PKRU);
- *vpkru = xpkru->pkru;
- }
- return 0;
+ return copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru);
}
EXPORT_SYMBOL_GPL(fpu_copy_uabi_to_guest_fpstate);
#endif /* CONFIG_KVM */
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 3a6ced76e932..bebc30c29ed3 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1205,10 +1205,22 @@ static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size,
* @fpstate: The fpstate buffer to copy to
* @kbuf: The UABI format buffer, if it comes from the kernel
* @ubuf: The UABI format buffer, if it comes from userspace
- * @pkru: unused
+ * @pkru: The location to write the PKRU value to
*
* Converts from the UABI format into the kernel internal hardware
* dependent format.
+ *
+ * This function ultimately has three different callers with distinct PKRU
+ * behavior.
+ * 1. When called from sigreturn the PKRU register will be restored from
+ * @fpstate via an XRSTOR. Correctly copying the UABI format buffer to
+ * @fpstate is sufficient to cover this case, but the caller will also
+ * pass a pointer to the thread_struct's pkru field in @pkru and updating
+ * it is harmless.
+ * 2. When called from ptrace the PKRU register will be restored from the
+ * thread_struct's pkru field. A pointer to that is passed in @pkru.
+ * 3. When called from KVM the PKRU register will be restored from the vcpu's
+ * pkru field. A pointer to that is passed in @pkru.
*/
static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
const void __user *ubuf, u32 *pkru)
@@ -1260,6 +1272,13 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
}
}
+ if (hdr.xfeatures & XFEATURE_MASK_PKRU) {
+ struct pkru_state *xpkru;
+
+ xpkru = __raw_xsave_addr(xsave, XFEATURE_PKRU);
+ *pkru = xpkru->pkru;
+ }
+
/*
* The state that came in from userspace was user-state only.
* Mask all the user states out of 'xfeatures':
--
2.38.1
Ives van Hoorne from codesandbox.io reported an issue regarding possible
data loss of uffd-wp when applied to memfds on heavily loaded systems. The
sympton is some read page got data mismatch from the snapshot child VMs.
Here I can also reproduce with a Rust reproducer that was provided by Ives
that keeps taking snapshot of a 256MB VM, on a 32G system when I initiate
80 instances I can trigger the issues in ten minutes.
It turns out that we got some pages write-through even if uffd-wp is
applied to the pte.
The problem is, when removing migration entries, we didn't really worry
about write bit as long as we know it's not a write migration entry. That
may not be true, for some memory types (e.g. writable shmem) mk_pte can
return a pte with write bit set, then to recover the migration entry to its
original state we need to explicit wr-protect the pte or it'll has the
write bit set if it's a read migration entry.
For uffd it can cause write-through. I didn't verify, but I think it'll be
the same for mprotect()ed pages and after migration we can miss the sigbus
instead.
The relevant code on uffd was introduced in the anon support, which is
commit f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration",
2020-04-07). However anon shouldn't suffer from this problem because anon
should already have the write bit cleared always, so that may not be a
proper Fixes target. To satisfy the need on the backport, I'm attaching
the Fixes tag to the uffd-wp shmem support. Since no one had issue with
mprotect, so I assume that's also the kernel version we should start to
backport for stable, and we shouldn't need to worry before that.
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
Reported-by: Ives van Hoorne <ives(a)codesandbox.io>
Signed-off-by: Peter Xu <peterx(a)redhat.com>
---
mm/migrate.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index dff333593a8a..8b6351c08c78 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -213,8 +213,14 @@ static bool remove_migration_pte(struct folio *folio,
pte = pte_mkdirty(pte);
if (is_writable_migration_entry(entry))
pte = maybe_mkwrite(pte, vma);
- else if (pte_swp_uffd_wp(*pvmw.pte))
+ else
+ /* NOTE: mk_pte can have write bit set */
+ pte = pte_wrprotect(pte);
+
+ if (pte_swp_uffd_wp(*pvmw.pte)) {
+ WARN_ON_ONCE(pte_write(pte));
pte = pte_mkuffd_wp(pte);
+ }
if (folio_test_anon(folio) && !is_readable_migration_entry(entry))
rmap_flags |= RMAP_EXCLUSIVE;
--
2.37.3
From: Frieder Schrempf <frieder.schrempf(a)kontron.de>
In case the requested bus clock is higher than the input clock, the correct
dividers (pre = 0, post = 0) are returned from mx51_ecspi_clkdiv(), but
*fres is left uninitialized and therefore contains an arbitrary value.
This causes trouble for the recently introduced PIO polling feature as the
value in spi_imx->spi_bus_clk is used there to calculate for which
transfers to enable PIO polling.
Fix this by setting *fres even if no clock dividers are in use.
This issue was observed on Kontron BL i.MX8MM with an SPI peripheral clock set
to 50 MHz by default and a requested SPI bus clock of 80 MHz for the SPI NOR
flash.
With the fix applied the debug message from mx51_ecspi_clkdiv() now prints the
following:
spi_imx 30820000.spi: mx51_ecspi_clkdiv: fin: 50000000, fspi: 50000000,
post: 0, pre: 0
Fixes: 07e759387788 ("spi: spi-imx: add PIO polling support")
Cc: Marc Kleine-Budde <mkl(a)pengutronix.de>
Cc: David Jander <david(a)protonic.nl>
Cc: Fabio Estevam <festevam(a)gmail.com>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Marek Vasut <marex(a)denx.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Frieder Schrempf <frieder.schrempf(a)kontron.de>
---
Changes for v2:
* Remove the reference and the Fixes tag for commit 6fd8b8503a0d as it is
incorrect.
---
drivers/spi/spi-imx.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 30d82cc7300b..468ce0a2b282 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -444,8 +444,7 @@ static unsigned int mx51_ecspi_clkdiv(struct spi_imx_data *spi_imx,
unsigned int pre, post;
unsigned int fin = spi_imx->spi_clk;
- if (unlikely(fspi > fin))
- return 0;
+ fspi = min(fspi, fin);
post = fls(fin) - fls(fspi);
if (fin > fspi << post)
--
2.38.1
commit 21bb8af513d35c005c401706030f4eb469538d1d upstream.
Move generic non-atomic bitops from the asm-generic header which
gets included only when there are no architecture-specific
alternatives, to a separate independent file to make them always
available.
Almost no actual code changes, only one comment added to
generic_test_bit() saying that it's an atomic operation itself
and thus `volatile` must always stay there with no cast-aways.
Suggested-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com> # comment
Suggested-by: Marco Elver <elver(a)google.com> # reference to kernel-doc
Signed-off-by: Alexander Lobakin <alexandr.lobakin(a)intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Reviewed-by: Marco Elver <elver(a)google.com>
Signed-off-by: Yury Norov <yury.norov(a)gmail.com>
---
include/asm-generic/bitops/generic-non-atomic.h | 130 ++++++++++++++++++++++++
include/asm-generic/bitops/non-atomic.h | 110 +-------------------
2 files changed, 138 insertions(+), 102 deletions(-)
Index: linux-stable/include/asm-generic/bitops/generic-non-atomic.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-stable/include/asm-generic/bitops/generic-non-atomic.h 2022-11-14 22:17:30.000000000 +0100
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+
+#include <linux/bits.h>
+
+#ifndef _LINUX_BITOPS_H
+#error only <linux/bitops.h> can be included directly
+#endif
+
+/*
+ * Generic definitions for bit operations, should not be used in regular code
+ * directly.
+ */
+
+/**
+ * generic___set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline void
+generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p |= mask;
+}
+
+static __always_inline void
+generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p &= ~mask;
+}
+
+/**
+ * generic___change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline
+void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p ^= mask;
+}
+
+/**
+ * generic___test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail. You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old | mask;
+ return (old & mask) != 0;
+}
+
+/**
+ * generic___test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail. You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old & ~mask;
+ return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static __always_inline int
+generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old ^ mask;
+ return (old & mask) != 0;
+}
+
+/**
+ * generic_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static __always_inline int
+generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+{
+ /*
+ * Unlike the bitops with the '__' prefix above, this one *is* atomic,
+ * so `volatile` must always stay here with no cast-aways. See
+ * `Documentation/atomic_bitops.txt` for the details.
+ */
+ return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
Index: linux-stable/include/asm-generic/bitops/non-atomic.h
===================================================================
--- linux-stable.orig/include/asm-generic/bitops/non-atomic.h 2022-11-14 22:17:30.000000000 +0100
+++ linux-stable/include/asm-generic/bitops/non-atomic.h 2022-11-14 22:17:30.000000000 +0100
@@ -2,121 +2,27 @@
#ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
#define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
-#include <asm/types.h>
+#include <asm-generic/bitops/generic-non-atomic.h>
-/**
- * arch___set_bit - Set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * Unlike set_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline void
-arch___set_bit(unsigned int nr, volatile unsigned long *addr)
-{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
- *p |= mask;
-}
+#define arch___set_bit generic___set_bit
#define __set_bit arch___set_bit
-static __always_inline void
-arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
- *p &= ~mask;
-}
+#define arch___clear_bit generic___clear_bit
#define __clear_bit arch___clear_bit
-/**
- * arch___change_bit - Toggle a bit in memory
- * @nr: the bit to change
- * @addr: the address to start counting from
- *
- * Unlike change_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline
-void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
-{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
- *p ^= mask;
-}
+#define arch___change_bit generic___change_bit
#define __change_bit arch___change_bit
-/**
- * arch___test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail. You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
-{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old = *p;
-
- *p = old | mask;
- return (old & mask) != 0;
-}
+#define arch___test_and_set_bit generic___test_and_set_bit
#define __test_and_set_bit arch___test_and_set_bit
-/**
- * arch___test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail. You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old = *p;
-
- *p = old & ~mask;
- return (old & mask) != 0;
-}
+#define arch___test_and_clear_bit generic___test_and_clear_bit
#define __test_and_clear_bit arch___test_and_clear_bit
-/* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
-{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old = *p;
-
- *p = old ^ mask;
- return (old & mask) != 0;
-}
+#define arch___test_and_change_bit generic___test_and_change_bit
#define __test_and_change_bit arch___test_and_change_bit
-/**
- * arch_test_bit - Determine whether a bit is set
- * @nr: bit number to test
- * @addr: Address to start counting from
- */
-static __always_inline int
-arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
-{
- return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
-}
+#define arch_test_bit generic_test_bit
#define test_bit arch_test_bit
#endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
commit 21bb8af513d35c005c401706030f4eb469538d1d upstream.
Move generic non-atomic bitops from the asm-generic header which
gets included only when there are no architecture-specific
alternatives, to a separate independent file to make them always
available.
Almost no actual code changes, only one comment added to
generic_test_bit() saying that it's an atomic operation itself
and thus `volatile` must always stay there with no cast-aways.
Suggested-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com> # comment
Suggested-by: Marco Elver <elver(a)google.com> # reference to kernel-doc
Signed-off-by: Alexander Lobakin <alexandr.lobakin(a)intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Reviewed-by: Marco Elver <elver(a)google.com>
Signed-off-by: Yury Norov <yury.norov(a)gmail.com>
---
include/asm-generic/bitops/generic-non-atomic.h | 130 ++++++++++++++++++++++++
1 file changed, 130 insertions(+)
Index: linux-stable/include/asm-generic/bitops/generic-non-atomic.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-stable/include/asm-generic/bitops/generic-non-atomic.h 2022-11-14 22:15:15.000000000 +0100
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+
+#include <linux/bits.h>
+
+#ifndef _LINUX_BITOPS_H
+#error only <linux/bitops.h> can be included directly
+#endif
+
+/*
+ * Generic definitions for bit operations, should not be used in regular code
+ * directly.
+ */
+
+/**
+ * generic___set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline void
+generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p |= mask;
+}
+
+static __always_inline void
+generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p &= ~mask;
+}
+
+/**
+ * generic___change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline
+void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p ^= mask;
+}
+
+/**
+ * generic___test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail. You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old | mask;
+ return (old & mask) != 0;
+}
+
+/**
+ * generic___test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail. You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old & ~mask;
+ return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static __always_inline int
+generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old ^ mask;
+ return (old & mask) != 0;
+}
+
+/**
+ * generic_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static __always_inline int
+generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+{
+ /*
+ * Unlike the bitops with the '__' prefix above, this one *is* atomic,
+ * so `volatile` must always stay here with no cast-aways. See
+ * `Documentation/atomic_bitops.txt` for the details.
+ */
+ return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
commit 21bb8af513d35c005c401706030f4eb469538d1d upstream.
Move generic non-atomic bitops from the asm-generic header which
gets included only when there are no architecture-specific
alternatives, to a separate independent file to make them always
available.
Almost no actual code changes, only one comment added to
generic_test_bit() saying that it's an atomic operation itself
and thus `volatile` must always stay there with no cast-aways.
Suggested-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com> # comment
Suggested-by: Marco Elver <elver(a)google.com> # reference to kernel-doc
Signed-off-by: Alexander Lobakin <alexandr.lobakin(a)intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Reviewed-by: Marco Elver <elver(a)google.com>
Signed-off-by: Yury Norov <yury.norov(a)gmail.com>
---
include/asm-generic/bitops/generic-non-atomic.h | 130 ++++++++++++++++++++++++
1 file changed, 130 insertions(+)
Index: linux-stable/include/asm-generic/bitops/generic-non-atomic.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-stable/include/asm-generic/bitops/generic-non-atomic.h 2022-11-14 22:15:42.000000000 +0100
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+
+#include <linux/bits.h>
+
+#ifndef _LINUX_BITOPS_H
+#error only <linux/bitops.h> can be included directly
+#endif
+
+/*
+ * Generic definitions for bit operations, should not be used in regular code
+ * directly.
+ */
+
+/**
+ * generic___set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline void
+generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p |= mask;
+}
+
+static __always_inline void
+generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p &= ~mask;
+}
+
+/**
+ * generic___change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline
+void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p ^= mask;
+}
+
+/**
+ * generic___test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail. You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old | mask;
+ return (old & mask) != 0;
+}
+
+/**
+ * generic___test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail. You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old & ~mask;
+ return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static __always_inline int
+generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old ^ mask;
+ return (old & mask) != 0;
+}
+
+/**
+ * generic_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static __always_inline int
+generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+{
+ /*
+ * Unlike the bitops with the '__' prefix above, this one *is* atomic,
+ * so `volatile` must always stay here with no cast-aways. See
+ * `Documentation/atomic_bitops.txt` for the details.
+ */
+ return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
commit 21bb8af513d35c005c401706030f4eb469538d1d upstream.
Move generic non-atomic bitops from the asm-generic header which
gets included only when there are no architecture-specific
alternatives, to a separate independent file to make them always
available.
Almost no actual code changes, only one comment added to
generic_test_bit() saying that it's an atomic operation itself
and thus `volatile` must always stay there with no cast-aways.
Suggested-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com> # comment
Suggested-by: Marco Elver <elver(a)google.com> # reference to kernel-doc
Signed-off-by: Alexander Lobakin <alexandr.lobakin(a)intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Reviewed-by: Marco Elver <elver(a)google.com>
Signed-off-by: Yury Norov <yury.norov(a)gmail.com>
---
include/asm-generic/bitops/generic-non-atomic.h | 130 ++++++++++++++++++++++++
1 file changed, 130 insertions(+)
Index: linux-stable/include/asm-generic/bitops/generic-non-atomic.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-stable/include/asm-generic/bitops/generic-non-atomic.h 2022-11-14 22:16:08.000000000 +0100
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+
+#include <linux/bits.h>
+
+#ifndef _LINUX_BITOPS_H
+#error only <linux/bitops.h> can be included directly
+#endif
+
+/*
+ * Generic definitions for bit operations, should not be used in regular code
+ * directly.
+ */
+
+/**
+ * generic___set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline void
+generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p |= mask;
+}
+
+static __always_inline void
+generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p &= ~mask;
+}
+
+/**
+ * generic___change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline
+void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p ^= mask;
+}
+
+/**
+ * generic___test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail. You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old | mask;
+ return (old & mask) != 0;
+}
+
+/**
+ * generic___test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail. You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old & ~mask;
+ return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static __always_inline int
+generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old ^ mask;
+ return (old & mask) != 0;
+}
+
+/**
+ * generic_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static __always_inline int
+generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+{
+ /*
+ * Unlike the bitops with the '__' prefix above, this one *is* atomic,
+ * so `volatile` must always stay here with no cast-aways. See
+ * `Documentation/atomic_bitops.txt` for the details.
+ */
+ return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
commit 21bb8af513d35c005c401706030f4eb469538d1d upstream.
Move generic non-atomic bitops from the asm-generic header which
gets included only when there are no architecture-specific
alternatives, to a separate independent file to make them always
available.
Almost no actual code changes, only one comment added to
generic_test_bit() saying that it's an atomic operation itself
and thus `volatile` must always stay there with no cast-aways.
Suggested-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com> # comment
Suggested-by: Marco Elver <elver(a)google.com> # reference to kernel-doc
Signed-off-by: Alexander Lobakin <alexandr.lobakin(a)intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Reviewed-by: Marco Elver <elver(a)google.com>
Signed-off-by: Yury Norov <yury.norov(a)gmail.com>
---
include/asm-generic/bitops/generic-non-atomic.h | 130 ++++++++++++++++++++++++
1 file changed, 130 insertions(+)
Index: linux-stable/include/asm-generic/bitops/generic-non-atomic.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-stable/include/asm-generic/bitops/generic-non-atomic.h 2022-11-14 22:16:35.000000000 +0100
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+
+#include <linux/bits.h>
+
+#ifndef _LINUX_BITOPS_H
+#error only <linux/bitops.h> can be included directly
+#endif
+
+/*
+ * Generic definitions for bit operations, should not be used in regular code
+ * directly.
+ */
+
+/**
+ * generic___set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline void
+generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p |= mask;
+}
+
+static __always_inline void
+generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p &= ~mask;
+}
+
+/**
+ * generic___change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline
+void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p ^= mask;
+}
+
+/**
+ * generic___test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail. You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old | mask;
+ return (old & mask) != 0;
+}
+
+/**
+ * generic___test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail. You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old & ~mask;
+ return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static __always_inline int
+generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old ^ mask;
+ return (old & mask) != 0;
+}
+
+/**
+ * generic_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static __always_inline int
+generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+{
+ /*
+ * Unlike the bitops with the '__' prefix above, this one *is* atomic,
+ * so `volatile` must always stay here with no cast-aways. See
+ * `Documentation/atomic_bitops.txt` for the details.
+ */
+ return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
commit 21bb8af513d35c005c401706030f4eb469538d1d upstream.
Move generic non-atomic bitops from the asm-generic header which
gets included only when there are no architecture-specific
alternatives, to a separate independent file to make them always
available.
Almost no actual code changes, only one comment added to
generic_test_bit() saying that it's an atomic operation itself
and thus `volatile` must always stay there with no cast-aways.
Suggested-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com> # comment
Suggested-by: Marco Elver <elver(a)google.com> # reference to kernel-doc
Signed-off-by: Alexander Lobakin <alexandr.lobakin(a)intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Reviewed-by: Marco Elver <elver(a)google.com>
Signed-off-by: Yury Norov <yury.norov(a)gmail.com>
---
include/asm-generic/bitops/generic-non-atomic.h | 130 ++++++++++++++++++++++++
1 file changed, 130 insertions(+)
Index: linux-stable/include/asm-generic/bitops/generic-non-atomic.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-stable/include/asm-generic/bitops/generic-non-atomic.h 2022-11-14 22:17:06.000000000 +0100
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+
+#include <linux/bits.h>
+
+#ifndef _LINUX_BITOPS_H
+#error only <linux/bitops.h> can be included directly
+#endif
+
+/*
+ * Generic definitions for bit operations, should not be used in regular code
+ * directly.
+ */
+
+/**
+ * generic___set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline void
+generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p |= mask;
+}
+
+static __always_inline void
+generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p &= ~mask;
+}
+
+/**
+ * generic___change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline
+void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p ^= mask;
+}
+
+/**
+ * generic___test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail. You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old | mask;
+ return (old & mask) != 0;
+}
+
+/**
+ * generic___test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail. You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old & ~mask;
+ return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static __always_inline int
+generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old ^ mask;
+ return (old & mask) != 0;
+}
+
+/**
+ * generic_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static __always_inline int
+generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+{
+ /*
+ * Unlike the bitops with the '__' prefix above, this one *is* atomic,
+ * so `volatile` must always stay here with no cast-aways. See
+ * `Documentation/atomic_bitops.txt` for the details.
+ */
+ return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
From: Frieder Schrempf <frieder.schrempf(a)kontron.de>
In case the requested bus clock is higher than the input clock, the correct
dividers (pre = 0, post = 0) are returned from mx51_ecspi_clkdiv(), but
*fres is left uninitialized and therefore contains an arbitrary value.
This causes trouble for the recently introduced PIO polling feature as the
value in spi_imx->spi_bus_clk is used there to calculate for which
transfers to enable PIO polling.
Less serious but also incorrect, it causes the calculation of the delay for
propagation of register changes added in
commit 6fd8b8503a0d ("spi: spi-imx: Fix out-of-order CS/SCLK operation at low speeds")
to be wrong, as not the actual bus clock but the potentially higher requested
bus clock is used.
Fix this by setting *fres even if no clock dividers are in use.
This issue was observed on Kontron BL i.MX8MM with an SPI peripheral clock set
to 50 MHz by default and a requested SPI bus clock of 80 MHz for the SPI NOR
flash.
With the fix applied the debug message from mx51_ecspi_clkdiv() now prints the
following:
spi_imx 30820000.spi: mx51_ecspi_clkdiv: fin: 50000000, fspi: 50000000,
post: 0, pre: 0
Fixes: 6fd8b8503a0d ("spi: spi-imx: Fix out-of-order CS/SCLK operation at low speeds")
Fixes: 07e759387788 ("spi: spi-imx: add PIO polling support")
Cc: Marc Kleine-Budde <mkl(a)pengutronix.de>
Cc: David Jander <david(a)protonic.nl>
Cc: Fabio Estevam <festevam(a)gmail.com>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Marek Vasut <marex(a)denx.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Frieder Schrempf <frieder.schrempf(a)kontron.de>
---
drivers/spi/spi-imx.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 30d82cc7300b..468ce0a2b282 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -444,8 +444,7 @@ static unsigned int mx51_ecspi_clkdiv(struct spi_imx_data *spi_imx,
unsigned int pre, post;
unsigned int fin = spi_imx->spi_clk;
- if (unlikely(fspi > fin))
- return 0;
+ fspi = min(fspi, fin);
post = fls(fin) - fls(fspi);
if (fin > fspi << post)
--
2.38.1
Since switching to HMM we always need that because we no longer grab
references to the pages.
Signed-off-by: Christian König <christian.koenig(a)amd.com>
CC: stable(a)vger.kernel.org
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 8ef31d687ef3..111484ceb47d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -413,11 +413,9 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void *data,
if (r)
goto release_object;
- if (args->flags & AMDGPU_GEM_USERPTR_REGISTER) {
- r = amdgpu_mn_register(bo, args->addr);
- if (r)
- goto release_object;
- }
+ r = amdgpu_mn_register(bo, args->addr);
+ if (r)
+ goto release_object;
if (args->flags & AMDGPU_GEM_USERPTR_VALIDATE) {
r = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages);
--
2.34.1
The secure update driver depends on the firmware-upload functionality of
the firmware-loader. The firmware-loader is carried in the firmware-class
driver which is enabled with the tristate CONFIG_FW_LOADER option. The
firmware-upload functionality is included in the firmware-class driver if
the bool FW_UPLOAD config is set.
The current dependency statement, "depends on FW_UPLOAD", is not adequate
because it does not implicitly turn on FW_LOADER. Instead of adding a
dependency, follow the convention used by drivers that require the
FW_LOADER_USER_HELPER functionality of the firmware-loader by using
select for both FW_LOADER and FW_UPLOAD.
Fixes: bdf86d0e6ca3 ("fpga: m10bmc-sec: create max10 bmc secure update")
Reported-by: kernel test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Russ Weight <russell.h.weight(a)intel.com>
---
drivers/fpga/Kconfig | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
index d1a8107fdcb3..6ce143dafd04 100644
--- a/drivers/fpga/Kconfig
+++ b/drivers/fpga/Kconfig
@@ -246,7 +246,9 @@ config FPGA_MGR_VERSAL_FPGA
config FPGA_M10_BMC_SEC_UPDATE
tristate "Intel MAX10 BMC Secure Update driver"
- depends on MFD_INTEL_M10_BMC && FW_UPLOAD
+ depends on MFD_INTEL_M10_BMC
+ select FW_LOADER
+ select FW_UPLOAD
help
Secure update support for the Intel MAX10 board management
controller.
--
2.25.1
Patch modifies the TD_SIZE in TRB before ZLP TRB.
The TD_SIZE in TRB before ZLP TRB must be set to 1 to force
processing ZLP TRB by controller.
cc: <stable(a)vger.kernel.org>
Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
Signed-off-by: Pawel Laszczak <pawell(a)cadence.com>
---
Changelog:
v2:
- returned value for last TRB must be 0
drivers/usb/cdns3/cdnsp-ring.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c
index 04dfcaa08dc4..aa79bce89d8a 100644
--- a/drivers/usb/cdns3/cdnsp-ring.c
+++ b/drivers/usb/cdns3/cdnsp-ring.c
@@ -1769,8 +1769,13 @@ static u32 cdnsp_td_remainder(struct cdnsp_device *pdev,
/* One TRB with a zero-length data packet. */
if (!more_trbs_coming || (transferred == 0 && trb_buff_len == 0) ||
- trb_buff_len == td_total_len)
+ trb_buff_len == td_total_len) {
+ /* Before ZLP driver needs set TD_SIZE=1. */
+ if (more_trbs_coming)
+ return 1;
+
return 0;
+ }
maxp = usb_endpoint_maxp(preq->pep->endpoint.desc);
total_packet_count = DIV_ROUND_UP(td_total_len, maxp);
--
2.25.1
The patch titled
Subject: hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
Date: Mon, 14 Nov 2022 15:55:06 -0800
madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear page
tables associated with the address range. For hugetlb vmas,
zap_page_range will call __unmap_hugepage_range_final. However,
__unmap_hugepage_range_final assumes the passed vma is about to be removed
and deletes the vma_lock to prevent pmd sharing as the vma is on the way
out. In the case of madvise(MADV_DONTNEED) the vma remains, but the
missing vma_lock prevents pmd sharing and could potentially lead to issues
with truncation/fault races.
This issue was originally reported here [1] as a BUG triggered in
page_try_dup_anon_rmap. Prior to the introduction of the hugetlb
vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to
prevent pmd sharing. Subsequent faults on this vma were confused as
VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping was
not set in new pages added to the page table. This resulted in pages that
appeared anonymous in a VM_SHARED vma and triggered the BUG.
Address issue by adding a new zap flag ZAP_FLAG_UNMAP to indicate an unmap
call from unmap_vmas(). This is used to indicate the 'final' unmapping of
a hugetlb vma. When called via MADV_DONTNEED, this flag is not set and
the vm_lock is not deleted.
[1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6J…
Link: https://lkml.kernel.org/r/20221114235507.294320-3-mike.kravetz@oracle.com
Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Wei Chen <harperchen1110(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mina Almasry <almasrymina(a)google.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Naoya Horiguchi <naoya.horiguchi(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 2 ++
mm/hugetlb.c | 27 ++++++++++++++++-----------
mm/memory.c | 2 +-
3 files changed, 19 insertions(+), 12 deletions(-)
--- a/include/linux/mm.h~hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing
+++ a/include/linux/mm.h
@@ -1868,6 +1868,8 @@ struct zap_details {
* default, the flag is not set.
*/
#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0))
+/* Set in unmap_vmas() to indicate a final unmap call. Only used by hugetlb */
+#define ZAP_FLAG_UNMAP ((__force zap_flags_t) BIT(1))
#ifdef CONFIG_MMU
extern bool can_do_mlock(void);
--- a/mm/hugetlb.c~hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing
+++ a/mm/hugetlb.c
@@ -5204,17 +5204,22 @@ void __unmap_hugepage_range_final(struct
__unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags);
- /*
- * Unlock and free the vma lock before releasing i_mmap_rwsem. When
- * the vma_lock is freed, this makes the vma ineligible for pmd
- * sharing. And, i_mmap_rwsem is required to set up pmd sharing.
- * This is important as page tables for this unmapped range will
- * be asynchrously deleted. If the page tables are shared, there
- * will be issues when accessed by someone else.
- */
- __hugetlb_vma_unlock_write_free(vma);
-
- i_mmap_unlock_write(vma->vm_file->f_mapping);
+ if (zap_flags & ZAP_FLAG_UNMAP) { /* final unmap */
+ /*
+ * Unlock and free the vma lock before releasing i_mmap_rwsem.
+ * When the vma_lock is freed, this makes the vma ineligible
+ * for pmd sharing. And, i_mmap_rwsem is required to set up
+ * pmd sharing. This is important as page tables for this
+ * unmapped range will be asynchrously deleted. If the page
+ * tables are shared, there will be issues when accessed by
+ * someone else.
+ */
+ __hugetlb_vma_unlock_write_free(vma);
+ i_mmap_unlock_write(vma->vm_file->f_mapping);
+ } else {
+ i_mmap_unlock_write(vma->vm_file->f_mapping);
+ hugetlb_vma_unlock_write(vma);
+ }
}
void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
--- a/mm/memory.c~hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing
+++ a/mm/memory.c
@@ -1711,7 +1711,7 @@ void unmap_vmas(struct mmu_gather *tlb,
{
struct mmu_notifier_range range;
struct zap_details details = {
- .zap_flags = ZAP_FLAG_DROP_MARKER,
+ .zap_flags = ZAP_FLAG_DROP_MARKER | ZAP_FLAG_UNMAP,
/* Careful - we need to zap private pages too! */
.even_cows = true,
};
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
ipc-shm-call-underlying-open-close-vm_ops.patch
madvise-use-zap_page_range_single-for-madvise-dontneed.patch
hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing.patch
selftests-vm-update-hugetlb-madvise.patch
hugetlb-remove-duplicate-mmu-notifications.patch
The patch titled
Subject: madvise: use zap_page_range_single for madvise dontneed
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
madvise-use-zap_page_range_single-for-madvise-dontneed.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: madvise: use zap_page_range_single for madvise dontneed
Date: Mon, 14 Nov 2022 15:55:05 -0800
This series addresses the issue first reported in [1], and fully described
in patch 2. Patches 1 and 2 address the user visible issue and are tagged
for stable backports.
While exploring solutions to this issue, related problems with mmu
notification calls were discovered. This is addressed in the patch
"hugetlb: remove duplicate mmu notifications:". Since there are no user
visible effects, this third is not tagged for stable backports.
Previous discussions suggested further cleanup by removing the
routine zap_page_range. This is possible because zap_page_range_single
is now exported, and all callers of zap_page_range pass ranges entirely
within a single vma. This work will be done in a later patch so as not
to distract from this bug fix.
[1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6J…
This patch (of 2):
Expose the routine zap_page_range_single to zap a range within a single
vma. The madvise routine madvise_dontneed_single_vma can use this routine
as it explicitly operates on a single vma. Also, update the mmu
notification range in zap_page_range_single to take hugetlb pmd sharing
into account. This is required as MADV_DONTNEED supports hugetlb vmas.
Link: https://lkml.kernel.org/r/20221114235507.294320-1-mike.kravetz@oracle.com
Link: https://lkml.kernel.org/r/20221114235507.294320-2-mike.kravetz@oracle.com
Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Wei Chen <harperchen1110(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mina Almasry <almasrymina(a)google.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Naoya Horiguchi <naoya.horiguchi(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 27 +++++++++++++++++++--------
mm/madvise.c | 6 +++---
mm/memory.c | 23 +++++++++++------------
3 files changed, 33 insertions(+), 23 deletions(-)
--- a/include/linux/mm.h~madvise-use-zap_page_range_single-for-madvise-dontneed
+++ a/include/linux/mm.h
@@ -1852,6 +1852,23 @@ static void __maybe_unused show_free_are
__show_free_areas(flags, nodemask, MAX_NR_ZONES - 1);
}
+/*
+ * Parameter block passed down to zap_pte_range in exceptional cases.
+ */
+struct zap_details {
+ struct folio *single_folio; /* Locked folio to be unmapped */
+ bool even_cows; /* Zap COWed private pages too? */
+ zap_flags_t zap_flags; /* Extra flags for zapping */
+};
+
+/*
+ * Whether to drop the pte markers, for example, the uffd-wp information for
+ * file-backed memory. This should only be specified when we will completely
+ * drop the page in the mm, either by truncation or unmapping of the vma. By
+ * default, the flag is not set.
+ */
+#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0))
+
#ifdef CONFIG_MMU
extern bool can_do_mlock(void);
#else
@@ -1869,6 +1886,8 @@ void zap_vma_ptes(struct vm_area_struct
unsigned long size);
void zap_page_range(struct vm_area_struct *vma, unsigned long address,
unsigned long size);
+void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
+ unsigned long size, struct zap_details *details);
void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
struct vm_area_struct *start_vma, unsigned long start,
unsigned long end);
@@ -3467,12 +3486,4 @@ madvise_set_anon_name(struct mm_struct *
}
#endif
-/*
- * Whether to drop the pte markers, for example, the uffd-wp information for
- * file-backed memory. This should only be specified when we will completely
- * drop the page in the mm, either by truncation or unmapping of the vma. By
- * default, the flag is not set.
- */
-#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0))
-
#endif /* _LINUX_MM_H */
--- a/mm/madvise.c~madvise-use-zap_page_range_single-for-madvise-dontneed
+++ a/mm/madvise.c
@@ -772,8 +772,8 @@ static int madvise_free_single_vma(struc
* Application no longer needs these pages. If the pages are dirty,
* it's OK to just throw them away. The app will be more careful about
* data it wants to keep. Be sure to free swap resources too. The
- * zap_page_range call sets things up for shrink_active_list to actually free
- * these pages later if no one else has touched them in the meantime,
+ * zap_page_range_single call sets things up for shrink_active_list to actually
+ * free these pages later if no one else has touched them in the meantime,
* although we could add these pages to a global reuse list for
* shrink_active_list to pick up before reclaiming other pages.
*
@@ -790,7 +790,7 @@ static int madvise_free_single_vma(struc
static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
- zap_page_range(vma, start, end - start);
+ zap_page_range_single(vma, start, end - start, NULL);
return 0;
}
--- a/mm/memory.c~madvise-use-zap_page_range_single-for-madvise-dontneed
+++ a/mm/memory.c
@@ -1341,15 +1341,6 @@ copy_page_range(struct vm_area_struct *d
return ret;
}
-/*
- * Parameter block passed down to zap_pte_range in exceptional cases.
- */
-struct zap_details {
- struct folio *single_folio; /* Locked folio to be unmapped */
- bool even_cows; /* Zap COWed private pages too? */
- zap_flags_t zap_flags; /* Extra flags for zapping */
-};
-
/* Whether we should zap all COWed (private) pages too */
static inline bool should_zap_cows(struct zap_details *details)
{
@@ -1774,19 +1765,27 @@ void zap_page_range(struct vm_area_struc
*
* The range must fit into one VMA.
*/
-static void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
+void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
unsigned long size, struct zap_details *details)
{
+ const unsigned long end = address + size;
struct mmu_notifier_range range;
struct mmu_gather tlb;
lru_add_drain();
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
- address, address + size);
+ address, end);
+ if (is_vm_hugetlb_page(vma))
+ adjust_range_if_pmd_sharing_possible(vma, &range.start,
+ &range.end);
tlb_gather_mmu(&tlb, vma->vm_mm);
update_hiwater_rss(vma->vm_mm);
mmu_notifier_invalidate_range_start(&range);
- unmap_single_vma(&tlb, vma, address, range.end, details);
+ /*
+ * unmap 'address-end' not 'range.start-range.end' as range
+ * could have been expanded for hugetlb pmd sharing.
+ */
+ unmap_single_vma(&tlb, vma, address, end, details);
mmu_notifier_invalidate_range_end(&range);
tlb_finish_mmu(&tlb);
}
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
ipc-shm-call-underlying-open-close-vm_ops.patch
madvise-use-zap_page_range_single-for-madvise-dontneed.patch
hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing.patch
selftests-vm-update-hugetlb-madvise.patch
hugetlb-remove-duplicate-mmu-notifications.patch
The patch titled
Subject: mm/migrate: fix read-only page got writable when recover pte
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-migrate-fix-read-only-page-got-writable-when-recover-pte.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/migrate: fix read-only page got writable when recover pte
Date: Sun, 13 Nov 2022 19:04:46 -0500
Ives van Hoorne from codesandbox.io reported an issue regarding possible
data loss of uffd-wp when applied to memfds on heavily loaded systems.
The symptom is some read page got data mismatch from the snapshot child
VMs.
Here I can also reproduce with a Rust reproducer that was provided by Ives
that keeps taking snapshot of a 256MB VM, on a 32G system when I initiate
80 instances I can trigger the issues in ten minutes.
It turns out that we got some pages write-through even if uffd-wp is
applied to the pte.
The problem is, when removing migration entries, we didn't really worry
about write bit as long as we know it's not a write migration entry. That
may not be true, for some memory types (e.g. writable shmem) mk_pte can
return a pte with write bit set, then to recover the migration entry to
its original state we need to explicit wr-protect the pte or it'll has the
write bit set if it's a read migration entry. For uffd it can cause
write-through.
The relevant code on uffd was introduced in the anon support, which is
commit f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration",
2020-04-07). However anon shouldn't suffer from this problem because anon
should already have the write bit cleared always, so that may not be a
proper Fixes target, while I'm adding the Fixes to be uffd shmem support.
Link: https://lkml.kernel.org/r/20221114000447.1681003-2-peterx@redhat.com
Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
Reported-by: Ives van Hoorne <ives(a)codesandbox.io>
Reviewed-by: Alistair Popple <apopple(a)nvidia.com>
Tested-by: Ives van Hoorne <ives(a)codesandbox.io>
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/mm/migrate.c~mm-migrate-fix-read-only-page-got-writable-when-recover-pte
+++ a/mm/migrate.c
@@ -213,8 +213,14 @@ static bool remove_migration_pte(struct
pte = pte_mkdirty(pte);
if (is_writable_migration_entry(entry))
pte = maybe_mkwrite(pte, vma);
- else if (pte_swp_uffd_wp(*pvmw.pte))
+ else
+ /* NOTE: mk_pte can have write bit set */
+ pte = pte_wrprotect(pte);
+
+ if (pte_swp_uffd_wp(*pvmw.pte)) {
+ WARN_ON_ONCE(pte_write(pte));
pte = pte_mkuffd_wp(pte);
+ }
if (folio_test_anon(folio) && !is_readable_migration_entry(entry))
rmap_flags |= RMAP_EXCLUSIVE;
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-migrate-fix-read-only-page-got-writable-when-recover-pte.patch
mm-always-compile-in-pte-markers.patch
mm-use-pte-markers-for-swap-errors.patch
madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear page
tables associated with the address range. For hugetlb vmas,
zap_page_range will call __unmap_hugepage_range_final. However,
__unmap_hugepage_range_final assumes the passed vma is about to be removed
and deletes the vma_lock to prevent pmd sharing as the vma is on the way
out. In the case of madvise(MADV_DONTNEED) the vma remains, but the
missing vma_lock prevents pmd sharing and could potentially lead to issues
with truncation/fault races.
This issue was originally reported here [1] as a BUG triggered in
page_try_dup_anon_rmap. Prior to the introduction of the hugetlb
vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to
prevent pmd sharing. Subsequent faults on this vma were confused as
VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping was
not set in new pages added to the page table. This resulted in pages that
appeared anonymous in a VM_SHARED vma and triggered the BUG.
Address issue by adding a new zap flag ZAP_FLAG_UNMAP to indicate an unmap
call from unmap_vmas(). This is used to indicate the 'final' unmapping of
a hugetlb vma. When called via MADV_DONTNEED, this flag is not set and the
vm_lock is not deleted.
[1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6J…
Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Wei Chen <harperchen1110(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
---
include/linux/mm.h | 2 ++
mm/hugetlb.c | 27 ++++++++++++++++-----------
mm/memory.c | 2 +-
3 files changed, 19 insertions(+), 12 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index dd5a38682537..a4e24dd2d96e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1886,6 +1886,8 @@ struct zap_details {
* default, the flag is not set.
*/
#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0))
+/* Set in unmap_vmas() to indicate a final unmap call. Only used by hugetlb */
+#define ZAP_FLAG_UNMAP ((__force zap_flags_t) BIT(1))
#ifdef CONFIG_MMU
extern bool can_do_mlock(void);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9d765364231e..7559b9dfe782 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5210,17 +5210,22 @@ void __unmap_hugepage_range_final(struct mmu_gather *tlb,
__unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags);
- /*
- * Unlock and free the vma lock before releasing i_mmap_rwsem. When
- * the vma_lock is freed, this makes the vma ineligible for pmd
- * sharing. And, i_mmap_rwsem is required to set up pmd sharing.
- * This is important as page tables for this unmapped range will
- * be asynchrously deleted. If the page tables are shared, there
- * will be issues when accessed by someone else.
- */
- __hugetlb_vma_unlock_write_free(vma);
-
- i_mmap_unlock_write(vma->vm_file->f_mapping);
+ if (zap_flags & ZAP_FLAG_UNMAP) { /* final unmap */
+ /*
+ * Unlock and free the vma lock before releasing i_mmap_rwsem.
+ * When the vma_lock is freed, this makes the vma ineligible
+ * for pmd sharing. And, i_mmap_rwsem is required to set up
+ * pmd sharing. This is important as page tables for this
+ * unmapped range will be asynchrously deleted. If the page
+ * tables are shared, there will be issues when accessed by
+ * someone else.
+ */
+ __hugetlb_vma_unlock_write_free(vma);
+ i_mmap_unlock_write(vma->vm_file->f_mapping);
+ } else {
+ i_mmap_unlock_write(vma->vm_file->f_mapping);
+ hugetlb_vma_unlock_write(vma);
+ }
}
void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
diff --git a/mm/memory.c b/mm/memory.c
index a177f6bbfafc..6d77bc00bca1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1673,7 +1673,7 @@ void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
{
struct mmu_notifier_range range;
struct zap_details details = {
- .zap_flags = ZAP_FLAG_DROP_MARKER,
+ .zap_flags = ZAP_FLAG_DROP_MARKER | ZAP_FLAG_UNMAP,
/* Careful - we need to zap private pages too! */
.even_cows = true,
};
--
2.38.1
Expose the routine zap_page_range_single to zap a range within a single
vma. The madvise routine madvise_dontneed_single_vma can use this
routine as it explicitly operates on a single vma. Also, update the mmu
notification range in zap_page_range_single to take hugetlb pmd sharing
into account. This is required as MADV_DONTNEED supports hugetlb vmas.
Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Wei Chen <harperchen1110(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
---
include/linux/mm.h | 27 +++++++++++++++++++--------
mm/madvise.c | 6 +++---
mm/memory.c | 23 +++++++++++------------
3 files changed, 33 insertions(+), 23 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9838b535fa21..dd5a38682537 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1870,6 +1870,23 @@ static void __maybe_unused show_free_areas(unsigned int flags, nodemask_t *nodem
__show_free_areas(flags, nodemask, MAX_NR_ZONES - 1);
}
+/*
+ * Parameter block passed down to zap_pte_range in exceptional cases.
+ */
+struct zap_details {
+ struct folio *single_folio; /* Locked folio to be unmapped */
+ bool even_cows; /* Zap COWed private pages too? */
+ zap_flags_t zap_flags; /* Extra flags for zapping */
+};
+
+/*
+ * Whether to drop the pte markers, for example, the uffd-wp information for
+ * file-backed memory. This should only be specified when we will completely
+ * drop the page in the mm, either by truncation or unmapping of the vma. By
+ * default, the flag is not set.
+ */
+#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0))
+
#ifdef CONFIG_MMU
extern bool can_do_mlock(void);
#else
@@ -1887,6 +1904,8 @@ void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
unsigned long size);
void zap_page_range(struct vm_area_struct *vma, unsigned long address,
unsigned long size);
+void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
+ unsigned long size, struct zap_details *details);
void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
struct vm_area_struct *start_vma, unsigned long start,
unsigned long end);
@@ -3518,12 +3537,4 @@ madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
}
#endif
-/*
- * Whether to drop the pte markers, for example, the uffd-wp information for
- * file-backed memory. This should only be specified when we will completely
- * drop the page in the mm, either by truncation or unmapping of the vma. By
- * default, the flag is not set.
- */
-#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0))
-
#endif /* _LINUX_MM_H */
diff --git a/mm/madvise.c b/mm/madvise.c
index df62d9e1035a..a21b186eb7a0 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -785,8 +785,8 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
* Application no longer needs these pages. If the pages are dirty,
* it's OK to just throw them away. The app will be more careful about
* data it wants to keep. Be sure to free swap resources too. The
- * zap_page_range call sets things up for shrink_active_list to actually free
- * these pages later if no one else has touched them in the meantime,
+ * zap_page_range_single call sets things up for shrink_active_list to actually
+ * free these pages later if no one else has touched them in the meantime,
* although we could add these pages to a global reuse list for
* shrink_active_list to pick up before reclaiming other pages.
*
@@ -803,7 +803,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
- zap_page_range(vma, start, end - start);
+ zap_page_range_single(vma, start, end - start, NULL);
return 0;
}
diff --git a/mm/memory.c b/mm/memory.c
index 98ddb91df9a7..a177f6bbfafc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1294,15 +1294,6 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
return ret;
}
-/*
- * Parameter block passed down to zap_pte_range in exceptional cases.
- */
-struct zap_details {
- struct folio *single_folio; /* Locked folio to be unmapped */
- bool even_cows; /* Zap COWed private pages too? */
- zap_flags_t zap_flags; /* Extra flags for zapping */
-};
-
/* Whether we should zap all COWed (private) pages too */
static inline bool should_zap_cows(struct zap_details *details)
{
@@ -1736,19 +1727,27 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
*
* The range must fit into one VMA.
*/
-static void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
+void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
unsigned long size, struct zap_details *details)
{
+ const unsigned long end = address + size;
struct mmu_notifier_range range;
struct mmu_gather tlb;
lru_add_drain();
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
- address, address + size);
+ address, end);
+ if (is_vm_hugetlb_page(vma))
+ adjust_range_if_pmd_sharing_possible(vma, &range.start,
+ &range.end);
tlb_gather_mmu(&tlb, vma->vm_mm);
update_hiwater_rss(vma->vm_mm);
mmu_notifier_invalidate_range_start(&range);
- unmap_single_vma(&tlb, vma, address, range.end, details);
+ /*
+ * unmap 'address-end' not 'range.start-range.end' as range
+ * could have been expanded for hugetlb pmd sharing.
+ */
+ unmap_single_vma(&tlb, vma, address, end, details);
mmu_notifier_invalidate_range_end(&range);
tlb_finish_mmu(&tlb);
}
--
2.38.1
Now that we've fixed the issue with using the incorrect topology manager,
we're actually grabbing the topology manager's lock - and consequently
deadlocking. Luckily for us though, there's actually nothing in AMD's DSC
state computation code that really should need this lock. The one exception
is the mutex_lock() in dm_dp_mst_is_port_support_mode(), however we grab no
locks beneath &mgr->lock there so that should be fine to leave be.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Gitlab issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2171
Fixes: 8c20a1ed9b4f ("drm/amd/display: MST DSC compute fair share")
Cc: <stable(a)vger.kernel.org> # v5.6+
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 5196c9a0e432d..59648f5ffb59d 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -1148,10 +1148,8 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
continue;
mst_mgr = aconnector->port->mgr;
- mutex_lock(&mst_mgr->lock);
ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- mutex_unlock(&mst_mgr->lock);
if (ret != 0)
return ret;
@@ -1208,10 +1206,8 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
continue;
mst_mgr = aconnector->port->mgr;
- mutex_lock(&mst_mgr->lock);
ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- mutex_unlock(&mst_mgr->lock);
if (ret != 0)
return ret;
--
2.37.3
This bug hurt me. Basically, it appears that we've been grabbing the
entirely wrong mutex in the MST DSC computation code for amdgpu! While
we've been grabbing:
amdgpu_dm_connector->mst_mgr
That's zero-initialized memory, because the only connectors we'll ever
actually be doing DSC computations for are MST ports. Which have mst_mgr
zero-initialized, and instead have the correct topology mgr pointer located
at:
amdgpu_dm_connector->mst_port->mgr;
I'm a bit impressed that until now, this code has managed not to crash
anyone's systems! It does seem to cause a warning in LOCKDEP though:
[ 66.637670] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
This was causing the problems that appeared to have been introduced by:
commit 4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state")
This wasn't actually where they came from though. Presumably, before the
only thing we were doing with the topology mgr pointer was attempting to
grab mst_mgr->lock. Since the above commit however, we grab much more
information from mst_mgr including the atomic MST state and respective
modesetting locks.
This patch also implies that up until now, it's quite likely we could be
susceptible to race conditions when going through the MST topology state
for DSC computations since we technically will not have grabbed any lock
when going through it.
So, let's fix this by adjusting all the respective code paths to look at
the right pointer and skip things that aren't actual MST connectors from a
topology.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Gitlab issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2171
Fixes: 8c20a1ed9b4f ("drm/amd/display: MST DSC compute fair share")
Cc: <stable(a)vger.kernel.org> # v5.6+
---
.../display/amdgpu_dm/amdgpu_dm_mst_types.c | 37 +++++++++----------
1 file changed, 18 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index bba2e8aaa2c20..5196c9a0e432d 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -1117,6 +1117,7 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
struct dc_stream_state *stream;
bool computed_streams[MAX_PIPES];
struct amdgpu_dm_connector *aconnector;
+ struct drm_dp_mst_topology_mgr *mst_mgr;
int link_vars_start_index = 0;
int ret = 0;
@@ -1131,7 +1132,7 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
aconnector = (struct amdgpu_dm_connector *)stream->dm_stream_context;
- if (!aconnector || !aconnector->dc_sink)
+ if (!aconnector || !aconnector->dc_sink || !aconnector->port)
continue;
if (!aconnector->dc_sink->dsc_caps.dsc_dec_caps.is_dsc_supported)
@@ -1146,16 +1147,13 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
if (!is_dsc_need_re_compute(state, dc_state, stream->link))
continue;
- mutex_lock(&aconnector->mst_mgr.lock);
-
- ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars,
- &aconnector->mst_mgr,
+ mst_mgr = aconnector->port->mgr;
+ mutex_lock(&mst_mgr->lock);
+ ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- if (ret != 0) {
- mutex_unlock(&aconnector->mst_mgr.lock);
+ mutex_unlock(&mst_mgr->lock);
+ if (ret != 0)
return ret;
- }
- mutex_unlock(&aconnector->mst_mgr.lock);
for (j = 0; j < dc_state->stream_count; j++) {
if (dc_state->streams[j]->link == stream->link)
@@ -1182,6 +1180,7 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
struct dc_stream_state *stream;
bool computed_streams[MAX_PIPES];
struct amdgpu_dm_connector *aconnector;
+ struct drm_dp_mst_topology_mgr *mst_mgr;
int link_vars_start_index = 0;
int ret;
@@ -1196,7 +1195,7 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
aconnector = (struct amdgpu_dm_connector *)stream->dm_stream_context;
- if (!aconnector || !aconnector->dc_sink)
+ if (!aconnector || !aconnector->dc_sink || !aconnector->port)
continue;
if (!aconnector->dc_sink->dsc_caps.dsc_dec_caps.is_dsc_supported)
@@ -1208,15 +1207,13 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state,
if (!is_dsc_need_re_compute(state, dc_state, stream->link))
continue;
- mutex_lock(&aconnector->mst_mgr.lock);
- ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars,
- &aconnector->mst_mgr,
+ mst_mgr = aconnector->port->mgr;
+ mutex_lock(&mst_mgr->lock);
+ ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr,
&link_vars_start_index);
- if (ret != 0) {
- mutex_unlock(&aconnector->mst_mgr.lock);
+ mutex_unlock(&mst_mgr->lock);
+ if (ret != 0)
return ret;
- }
- mutex_unlock(&aconnector->mst_mgr.lock);
for (j = 0; j < dc_state->stream_count; j++) {
if (dc_state->streams[j]->link == stream->link)
@@ -1419,6 +1416,7 @@ enum dc_status dm_dp_mst_is_port_support_mode(
unsigned int upper_link_bw_in_kbps = 0, down_link_bw_in_kbps = 0;
unsigned int max_compressed_bw_in_kbps = 0;
struct dc_dsc_bw_range bw_range = {0};
+ struct drm_dp_mst_topology_mgr *mst_mgr;
/*
* check if the mode could be supported if DSC pass-through is supported
@@ -1427,7 +1425,8 @@ enum dc_status dm_dp_mst_is_port_support_mode(
*/
if (is_dsc_common_config_possible(stream, &bw_range) &&
aconnector->port->passthrough_aux) {
- mutex_lock(&aconnector->mst_mgr.lock);
+ mst_mgr = aconnector->port->mgr;
+ mutex_lock(&mst_mgr->lock);
cur_link_settings = stream->link->verified_link_cap;
@@ -1440,7 +1439,7 @@ enum dc_status dm_dp_mst_is_port_support_mode(
end_to_end_bw_in_kbps = min(upper_link_bw_in_kbps,
down_link_bw_in_kbps);
- mutex_unlock(&aconnector->mst_mgr.lock);
+ mutex_unlock(&mst_mgr->lock);
/*
* use the maximum dsc compression bandwidth as the required
--
2.37.3
Looks like that we're accidentally dropping a pretty important return code
here. For some reason, we just return -EINVAL if we fail to get the MST
topology state. This is wrong: error codes are important and should never
be squashed without being handled, which here seems to have the potential
to cause a deadlock.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Reviewed-by: Wayne Lin <Wayne.Lin(a)amd.com>
Fixes: 8ec046716ca8 ("drm/dp_mst: Add helper to trigger modeset on affected DSC MST CRTCs")
Cc: <stable(a)vger.kernel.org> # v5.6+
---
drivers/gpu/drm/display/drm_dp_mst_topology.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index ecd22c038c8c0..51a46689cda70 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -5186,7 +5186,7 @@ int drm_dp_mst_add_affected_dsc_crtcs(struct drm_atomic_state *state, struct drm
mst_state = drm_atomic_get_mst_topology_state(state, mgr);
if (IS_ERR(mst_state))
- return -EINVAL;
+ return PTR_ERR(mst_state);
list_for_each_entry(pos, &mst_state->payloads, next) {
--
2.37.3
The patch titled
Subject: mm/damon/sysfs-schemes: skip stats update if the scheme directory is removed
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-damon-sysfs-schemes-skip-stats-update-if-the-scheme-directory-is-removed.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/sysfs-schemes: skip stats update if the scheme directory is removed
Date: Mon, 14 Nov 2022 17:55:52 +0000
A DAMON sysfs interface user can start DAMON with a scheme, remove the
sysfs directory for the scheme, and then ask update of the scheme's stats.
Because the schemes stats update logic isn't aware of the situation, it
results in an invalid memory access. Fix the bug by checking if the
scheme sysfs directory exists.
Link: https://lkml.kernel.org/r/20221114175552.1951-1-sj@kernel.org
Fixes: 0ac32b8affb5 ("mm/damon/sysfs: support DAMOS stats")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [v5.18]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/sysfs.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-schemes-skip-stats-update-if-the-scheme-directory-is-removed
+++ a/mm/damon/sysfs.c
@@ -2339,6 +2339,10 @@ static int damon_sysfs_upd_schemes_stats
damon_for_each_scheme(scheme, ctx) {
struct damon_sysfs_stats *sysfs_stats;
+ /* user could removed the scheme sysfs dir */
+ if (schemes_idx >= sysfs_schemes->nr)
+ break;
+
sysfs_stats = sysfs_schemes->schemes_arr[schemes_idx++]->stats;
sysfs_stats->nr_tried = scheme->stat.nr_tried;
sysfs_stats->sz_tried = scheme->stat.sz_tried;
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-sysfs-schemes-skip-stats-update-if-the-scheme-directory-is-removed.patch
docs-admin-guide-mm-damon-usage-describe-the-rules-of-sysfs-region-directories.patch
docs-admin-guide-mm-damon-usage-fix-wrong-usage-example-of-init_regions-file.patch
mm-damon-core-add-a-callback-for-scheme-target-regions-check.patch
mm-damon-sysfs-schemes-implement-schemes-tried_regions-directory.patch
mm-damon-sysfs-schemes-implement-scheme-region-directory.patch
mm-damon-sysfs-implement-damos-tried-regions-update-command.patch
mm-damon-sysfs-implement-damos-tried-regions-update-command-fix.patch
mm-damon-sysfs-schemes-implement-damos-tried-regions-clear-command.patch
mm-damon-sysfs-schemes-implement-damos-tried-regions-clear-command-fix.patch
tools-selftets-damon-sysfs-test-tried_regions-directory-existence.patch
docs-admin-guide-mm-damon-usage-document-schemes-s-tried_regions-sysfs-directory.patch
docs-abi-damon-document-schemes-s-tried_regions-sysfs-directory.patch
selftests-damon-test-non-context-inputs-to-rm_contexts-file.patch
The patch titled
Subject: mm: correctly charge compressed memory to its memcg
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-correctly-charge-compressed-memory-to-its-memcg.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Li Liguang <liliguang(a)baidu.com>
Subject: mm: correctly charge compressed memory to its memcg
Date: Mon, 14 Nov 2022 14:48:28 -0500
Kswapd will reclaim memory when memory pressure is high, the annonymous
memory will be compressed and stored in the zpool if zswap is enabled.
The memcg_kmem_bypass() in get_obj_cgroup_from_page() will bypass the
kernel thread and cause the compressed memory not be charged to its memory
cgroup.
Remove the memcg_kmem_bypass() call and properly charge compressed memory
to its corresponding memory cgroup.
Link: https://lore.kernel.org/linux-mm/CALvZod4nnn8BHYqAM4xtcR0Ddo2-Wr8uKm9h_CHWU…
Link: https://lkml.kernel.org/r/20221114194828.100822-1-hannes@cmpxchg.org
Fixes: f4840ccfca25 ("zswap: memcg accounting")
Signed-off-by: Li Liguang <liliguang(a)baidu.com>
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Muchun Song <songmuchun(a)bytedance.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org> [5.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memcontrol.c~mm-correctly-charge-compressed-memory-to-its-memcg
+++ a/mm/memcontrol.c
@@ -3026,7 +3026,7 @@ struct obj_cgroup *get_obj_cgroup_from_p
{
struct obj_cgroup *objcg;
- if (!memcg_kmem_enabled() || memcg_kmem_bypass())
+ if (!memcg_kmem_enabled())
return NULL;
if (PageMemcgKmem(page)) {
_
Patches currently in -mm which might be from liliguang(a)baidu.com are
mm-correctly-charge-compressed-memory-to-its-memcg.patch
From: Li Liguang <liliguang(a)baidu.com>
Kswapd will reclaim memory when memory pressure is high, the
annonymous memory will be compressed and stored in the zpool
if zswap is enabled. The memcg_kmem_bypass() in
get_obj_cgroup_from_page() will bypass the kernel thread and
cause the compressed memory not charged to its memory cgroup.
Remove the memcg_kmem_bypass() and properly charge compressed
memory to its corresponding memory cgroup.
Link: https://lore.kernel.org/linux-mm/CALvZod4nnn8BHYqAM4xtcR0Ddo2-Wr8uKm9h_CHWU…
Fixes: f4840ccfca25 ("zswap: memcg accounting")
Cc: stable(a)vger.kernel.org # 5.19+
Acked-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Muchun Song <songmuchun(a)bytedance.com>
Signed-off-by: Li Liguang <liliguang(a)baidu.com>
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
This fell through the cracks in summer as it didn't make it to the
right mailing lists. Resending.
I know it's close to 6.1-final, but the fix should be safe to put
in. It's straight-forward and obvious code-wise. We also have large
parts of production running on it without problems.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2d8549ae1b30..a1a35c12635e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3026,7 +3026,7 @@ struct obj_cgroup *get_obj_cgroup_from_page(struct page *page)
{
struct obj_cgroup *objcg;
- if (!memcg_kmem_enabled() || memcg_kmem_bypass())
+ if (!memcg_kmem_enabled())
return NULL;
if (PageMemcgKmem(page)) {
--
2.38.1
We have discovered that some power lines were always on even if the devices
on that power line was not used.
This happens because we failed to probe a device on the i2c bus, and the
ACPI Power Resource were never turned off.
This patch tries to fix this issue.
To: Mika Westerberg <mika.westerberg(a)linux.intel.com>
To: Wolfram Sang <wsa(a)kernel.org>
To: Sakari Ailus <sakari.ailus(a)linux.intel.com>
To: Tomasz Figa <tfiga(a)chromium.org>
To: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Cc: Hidenori Kobayashi <hidenorik(a)google.com>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: linux-i2c(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Changes in v6:
- Add Reviewed-by: Hidenori (Thanks!).
- Set device always off at remove.
- Link to v5: https://lore.kernel.org/r/20221109-i2c-waive-v5-0-2839667f8f6a@chromium.org
Changes in v5:
- Add Cc: stable
- Add Reviewed-by Sakary (Thanks!).
- Renamed turn-off as power-off, in the name of consistency (Thanks Sergey!)
- Link to v4: https://lore.kernel.org/r/20221109-i2c-waive-v4-0-e4496462833b@chromium.org
Changes in v4:
- Rename full_power to do_power_on.
- Link to v3: https://lore.kernel.org/r/20221109-i2c-waive-v3-0-d8651cb4b88d@chromium.org
Changes in v3:
- Introduce full_power variable to make more clear what we are doing.
- Link to v2: https://lore.kernel.org/r/20221109-i2c-waive-v2-0-07550bf2dacc@chromium.org
Changes in v2:
- Cover also device remove
- Link to v1: https://lore.kernel.org/r/20221109-i2c-waive-v1-0-ed70a99b990d@chromium.org
---
Ricardo Ribalda (1):
i2c: Restore initial power state if probe fails
drivers/i2c/i2c-core-base.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
---
base-commit: f141df371335645ce29a87d9683a3f79fba7fd67
change-id: 20221109-i2c-waive-ae97fea1f1b5
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
A DAMON sysfs interface user can start DAMON with a scheme, remove the
sysfs directory for the scheme, and then ask update of the scheme's
stats. Because the schemes stats update logic doesn't aware of the
situation, it results in an invalid memory access. Fix the bug by
checking if the scheme sysfs directory exists.
Fixes: 0ac32b8affb5 ("mm/damon/sysfs: support DAMOS stats")
Cc: <stable(a)vger.kernel.org> # v5.18
Signed-off-by: SeongJae Park <sj(a)kernel.org>
---
Note: There are DAMON code refactoring patches in mm-stable. As the
refactoring changes the code that this fix is touching, while this fix
is for v6.1 hotfix, this patch is based on the latest mainline, not the
mm-unstable. In other words, this patch cannot cleanly applied on
mm-unstable. You could get this patch based on latest mm-unstable via
damon/next tree[1], though.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/sj/linux.git/tree/?h=damon/…
mm/damon/sysfs.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/damon/sysfs.c b/mm/damon/sysfs.c
index 9f1219a67e3f..9701ef178a4d 100644
--- a/mm/damon/sysfs.c
+++ b/mm/damon/sysfs.c
@@ -2339,6 +2339,10 @@ static int damon_sysfs_upd_schemes_stats(struct damon_sysfs_kdamond *kdamond)
damon_for_each_scheme(scheme, ctx) {
struct damon_sysfs_stats *sysfs_stats;
+ /* user could removed the scheme sysfs dir */
+ if (schemes_idx >= sysfs_schemes->nr)
+ break;
+
sysfs_stats = sysfs_schemes->schemes_arr[schemes_idx++]->stats;
sysfs_stats->nr_tried = scheme->stat.nr_tried;
sysfs_stats->sz_tried = scheme->stat.sz_tried;
--
2.25.1
From: Xiubo Li <xiubli(a)redhat.com>
When ceph releasing the file_lock it will try to get the inode pointer
from the fl->fl_file, which the memory could already be released by
another thread in filp_close(). Because in VFS layer the fl->fl_file
doesn't increase the file's reference counter.
Will switch to use ceph dedicate lock info to track the inode.
And in ceph_fl_release_lock() we should skip all the operations if
the fl->fl_u.ceph_fl.fl_inode is not set, which should come from
the request file_lock. And we will set fl->fl_u.ceph_fl.fl_inode when
inserting it to the inode lock list, which is when copying the lock.
Cc: stable(a)vger.kernel.org
URL: https://tracker.ceph.com/issues/57986
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
---
fs/ceph/locks.c | 18 +++++++++++++++---
include/linux/ceph/ceph_fs_fl.h | 26 ++++++++++++++++++++++++++
include/linux/fs.h | 2 ++
3 files changed, 43 insertions(+), 3 deletions(-)
create mode 100644 include/linux/ceph/ceph_fs_fl.h
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index 3e2843e86e27..d8385dd0076e 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -34,22 +34,34 @@ static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src)
{
struct ceph_file_info *fi = dst->fl_file->private_data;
struct inode *inode = file_inode(dst->fl_file);
+
atomic_inc(&ceph_inode(inode)->i_filelock_ref);
atomic_inc(&fi->num_locks);
+ dst->fl_u.ceph_fl.fl_inode = igrab(inode);
}
static void ceph_fl_release_lock(struct file_lock *fl)
{
struct ceph_file_info *fi = fl->fl_file->private_data;
- struct inode *inode = file_inode(fl->fl_file);
- struct ceph_inode_info *ci = ceph_inode(inode);
- atomic_dec(&fi->num_locks);
+ struct inode *inode = fl->fl_u.ceph_fl.fl_inode;
+ struct ceph_inode_info *ci;
+
+ /*
+ * If inode is NULL it should be a request file_lock,
+ * nothing we can do.
+ */
+ if (!inode)
+ return;
+
+ ci = ceph_inode(inode);
if (atomic_dec_and_test(&ci->i_filelock_ref)) {
/* clear error when all locks are released */
spin_lock(&ci->i_ceph_lock);
ci->i_ceph_flags &= ~CEPH_I_ERROR_FILELOCK;
spin_unlock(&ci->i_ceph_lock);
}
+ iput(inode);
+ atomic_dec(&fi->num_locks);
}
static const struct file_lock_operations ceph_fl_lock_ops = {
diff --git a/include/linux/ceph/ceph_fs_fl.h b/include/linux/ceph/ceph_fs_fl.h
new file mode 100644
index 000000000000..02a412b26095
--- /dev/null
+++ b/include/linux/ceph/ceph_fs_fl.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * ceph_fs.h - Ceph constants and data types to share between kernel and
+ * user space.
+ *
+ * Most types in this file are defined as little-endian, and are
+ * primarily intended to describe data structures that pass over the
+ * wire or that are stored on disk.
+ *
+ * LGPL2
+ */
+
+#ifndef CEPH_FS_FL_H
+#define CEPH_FS_FL_H
+
+#include <linux/fs.h>
+
+/*
+ * Ceph lock info
+ */
+
+struct ceph_lock_info {
+ struct inode *fl_inode;
+};
+
+#endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e654435f1651..db4810d19e26 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1066,6 +1066,7 @@ bool opens_in_grace(struct net *);
/* that will die - we need it for nfs_lock_info */
#include <linux/nfs_fs_i.h>
+#include <linux/ceph/ceph_fs_fl.h>
/*
* struct file_lock represents a generic "file lock". It's used to represent
@@ -1119,6 +1120,7 @@ struct file_lock {
int state; /* state of grant or error if -ve */
unsigned int debug_id;
} afs;
+ struct ceph_lock_info ceph_fl;
} fl_u;
} __randomize_layout;
--
2.31.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
3a876060892b ("drm/amdkfd: Migrate in CPU page fault use current mm")
acac270d0982 ("drm/amdkfd: Add migration SMI event")
e0f1e65b836c ("drm/amdkfd: Add GPU recoverable fault SMI event")
9527b9caf82b ("drm/amdkfd: evict svm bo worker handle error")
7ad153db5859 ("drm/amdkfd: handle VMA remove race")
740a451b0797 ("drm/amdkfd: Handle incomplete migration to system memory")
33c6bd989d5e ("drm/amdkfd: debug message to count successfully migrated pages")
75fa98d6e458 ("drm/amdkfd: clarify the origin of cpages returned by migration functions")
ca432dcc27a1 ("drm/amdkfd: handle svm partial migration cpages 0")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3a876060892ba52dd67d197c78b955e62657d906 Mon Sep 17 00:00:00 2001
From: Philip Yang <Philip.Yang(a)amd.com>
Date: Thu, 8 Sep 2022 17:56:09 -0400
Subject: [PATCH] drm/amdkfd: Migrate in CPU page fault use current mm
migrate_vma_setup shows below warning because we don't hold another
process mm mmap_lock. We should use current vmf->vma->vm_mm instead, the
caller already hold current mmap lock inside CPU page fault handler.
WARNING: CPU: 10 PID: 3054 at include/linux/mmap_lock.h:155 find_vma
Call Trace:
walk_page_range+0x76/0x150
migrate_vma_setup+0x18a/0x640
svm_migrate_vram_to_ram+0x245/0xa10 [amdgpu]
svm_migrate_to_ram+0x36f/0x470 [amdgpu]
do_swap_page+0xcfe/0xec0
__handle_mm_fault+0x96b/0x15e0
handle_mm_fault+0x13f/0x3e0
do_user_addr_fault+0x1e7/0x690
Fixes: e1f84eef313f ("drm/amdkfd: handle CPU fault on COW mapping")
Signed-off-by: Philip Yang <Philip.Yang(a)amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 929dec1da0e4..7522bf2d2f57 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -949,7 +949,8 @@ static vm_fault_t svm_migrate_to_ram(struct vm_fault *vmf)
goto out_unlock_prange;
}
- r = svm_migrate_vram_to_ram(prange, mm, KFD_MIGRATE_TRIGGER_PAGEFAULT_CPU);
+ r = svm_migrate_vram_to_ram(prange, vmf->vma->vm_mm,
+ KFD_MIGRATE_TRIGGER_PAGEFAULT_CPU);
if (r)
pr_debug("failed %d migrate svms 0x%p range 0x%p [0x%lx 0x%lx]\n",
r, prange->svms, prange, prange->start, prange->last);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ba2423633ba6 ("dmaengine: at_hdmac: Fix descriptor handling when issuing it to hardware")
c6babed879fb ("dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all()")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ba2423633ba646e1df20e30cb3cf35495c16f173 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:45 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix descriptor handling when issuing it
to hardware
As it was before, the descriptor was issued to the hardware without adding
it to the active (issued) list. This could result in a completion of other
descriptor, or/and in the descriptor never being completed.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-12-tudor.ambarus@microchip.…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index b53a9fc15dd9..9e5a30396c1c 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -510,8 +510,11 @@ static void atc_advance_work(struct at_dma_chan *atchan)
/* advance work */
spin_lock_irqsave(&atchan->lock, flags);
- if (!list_empty(&atchan->active_list))
- atc_dostart(atchan, atc_first_active(atchan));
+ if (!list_empty(&atchan->active_list)) {
+ desc = atc_first_queued(atchan);
+ list_move_tail(&desc->desc_node, &atchan->active_list);
+ atc_dostart(atchan, desc);
+ }
spin_unlock_irqrestore(&atchan->lock, flags);
}
@@ -523,6 +526,7 @@ static void atc_advance_work(struct at_dma_chan *atchan)
static void atc_handle_error(struct at_dma_chan *atchan)
{
struct at_desc *bad_desc;
+ struct at_desc *desc;
struct at_desc *child;
unsigned long flags;
@@ -540,8 +544,11 @@ static void atc_handle_error(struct at_dma_chan *atchan)
list_splice_init(&atchan->queue, atchan->active_list.prev);
/* Try to restart the controller */
- if (!list_empty(&atchan->active_list))
- atc_dostart(atchan, atc_first_active(atchan));
+ if (!list_empty(&atchan->active_list)) {
+ desc = atc_first_queued(atchan);
+ list_move_tail(&desc->desc_node, &atchan->active_list);
+ atc_dostart(atchan, desc);
+ }
/*
* KERN_CRITICAL may seem harsh, but since this only happens
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ba2423633ba6 ("dmaengine: at_hdmac: Fix descriptor handling when issuing it to hardware")
c6babed879fb ("dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all()")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ba2423633ba646e1df20e30cb3cf35495c16f173 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:45 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix descriptor handling when issuing it
to hardware
As it was before, the descriptor was issued to the hardware without adding
it to the active (issued) list. This could result in a completion of other
descriptor, or/and in the descriptor never being completed.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-12-tudor.ambarus@microchip.…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index b53a9fc15dd9..9e5a30396c1c 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -510,8 +510,11 @@ static void atc_advance_work(struct at_dma_chan *atchan)
/* advance work */
spin_lock_irqsave(&atchan->lock, flags);
- if (!list_empty(&atchan->active_list))
- atc_dostart(atchan, atc_first_active(atchan));
+ if (!list_empty(&atchan->active_list)) {
+ desc = atc_first_queued(atchan);
+ list_move_tail(&desc->desc_node, &atchan->active_list);
+ atc_dostart(atchan, desc);
+ }
spin_unlock_irqrestore(&atchan->lock, flags);
}
@@ -523,6 +526,7 @@ static void atc_advance_work(struct at_dma_chan *atchan)
static void atc_handle_error(struct at_dma_chan *atchan)
{
struct at_desc *bad_desc;
+ struct at_desc *desc;
struct at_desc *child;
unsigned long flags;
@@ -540,8 +544,11 @@ static void atc_handle_error(struct at_dma_chan *atchan)
list_splice_init(&atchan->queue, atchan->active_list.prev);
/* Try to restart the controller */
- if (!list_empty(&atchan->active_list))
- atc_dostart(atchan, atc_first_active(atchan));
+ if (!list_empty(&atchan->active_list)) {
+ desc = atc_first_queued(atchan);
+ list_move_tail(&desc->desc_node, &atchan->active_list);
+ atc_dostart(atchan, desc);
+ }
/*
* KERN_CRITICAL may seem harsh, but since this only happens
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ba2423633ba6 ("dmaengine: at_hdmac: Fix descriptor handling when issuing it to hardware")
c6babed879fb ("dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all()")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ba2423633ba646e1df20e30cb3cf35495c16f173 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:45 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix descriptor handling when issuing it
to hardware
As it was before, the descriptor was issued to the hardware without adding
it to the active (issued) list. This could result in a completion of other
descriptor, or/and in the descriptor never being completed.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-12-tudor.ambarus@microchip.…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index b53a9fc15dd9..9e5a30396c1c 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -510,8 +510,11 @@ static void atc_advance_work(struct at_dma_chan *atchan)
/* advance work */
spin_lock_irqsave(&atchan->lock, flags);
- if (!list_empty(&atchan->active_list))
- atc_dostart(atchan, atc_first_active(atchan));
+ if (!list_empty(&atchan->active_list)) {
+ desc = atc_first_queued(atchan);
+ list_move_tail(&desc->desc_node, &atchan->active_list);
+ atc_dostart(atchan, desc);
+ }
spin_unlock_irqrestore(&atchan->lock, flags);
}
@@ -523,6 +526,7 @@ static void atc_advance_work(struct at_dma_chan *atchan)
static void atc_handle_error(struct at_dma_chan *atchan)
{
struct at_desc *bad_desc;
+ struct at_desc *desc;
struct at_desc *child;
unsigned long flags;
@@ -540,8 +544,11 @@ static void atc_handle_error(struct at_dma_chan *atchan)
list_splice_init(&atchan->queue, atchan->active_list.prev);
/* Try to restart the controller */
- if (!list_empty(&atchan->active_list))
- atc_dostart(atchan, atc_first_active(atchan));
+ if (!list_empty(&atchan->active_list)) {
+ desc = atc_first_queued(atchan);
+ list_move_tail(&desc->desc_node, &atchan->active_list);
+ atc_dostart(atchan, desc);
+ }
/*
* KERN_CRITICAL may seem harsh, but since this only happens
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ba2423633ba6 ("dmaengine: at_hdmac: Fix descriptor handling when issuing it to hardware")
c6babed879fb ("dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all()")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ba2423633ba646e1df20e30cb3cf35495c16f173 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:45 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix descriptor handling when issuing it
to hardware
As it was before, the descriptor was issued to the hardware without adding
it to the active (issued) list. This could result in a completion of other
descriptor, or/and in the descriptor never being completed.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-12-tudor.ambarus@microchip.…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index b53a9fc15dd9..9e5a30396c1c 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -510,8 +510,11 @@ static void atc_advance_work(struct at_dma_chan *atchan)
/* advance work */
spin_lock_irqsave(&atchan->lock, flags);
- if (!list_empty(&atchan->active_list))
- atc_dostart(atchan, atc_first_active(atchan));
+ if (!list_empty(&atchan->active_list)) {
+ desc = atc_first_queued(atchan);
+ list_move_tail(&desc->desc_node, &atchan->active_list);
+ atc_dostart(atchan, desc);
+ }
spin_unlock_irqrestore(&atchan->lock, flags);
}
@@ -523,6 +526,7 @@ static void atc_advance_work(struct at_dma_chan *atchan)
static void atc_handle_error(struct at_dma_chan *atchan)
{
struct at_desc *bad_desc;
+ struct at_desc *desc;
struct at_desc *child;
unsigned long flags;
@@ -540,8 +544,11 @@ static void atc_handle_error(struct at_dma_chan *atchan)
list_splice_init(&atchan->queue, atchan->active_list.prev);
/* Try to restart the controller */
- if (!list_empty(&atchan->active_list))
- atc_dostart(atchan, atc_first_active(atchan));
+ if (!list_empty(&atchan->active_list)) {
+ desc = atc_first_queued(atchan);
+ list_move_tail(&desc->desc_node, &atchan->active_list);
+ atc_dostart(atchan, desc);
+ }
/*
* KERN_CRITICAL may seem harsh, but since this only happens
We have discovered that some power lines were always on even if the devices
on that power line was not used.
This happens because we failed to probe a device on the i2c bus, and the
ACPI Power Resource were never turned off.
This patch tries to fix this issue.
To: Wolfram Sang <wsa(a)kernel.org>
To: Sakari Ailus <sakari.ailus(a)linux.intel.com>
To: Tomasz Figa <tfiga(a)chromium.org>
To: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Cc: Hidenori Kobayashi <hidenorik(a)google.com>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: linux-i2c(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Changes in v5:
- Add Cc: stable
- Add Reviewed-by Sakary (Thanks!)
- Renamed turn-off as power-off, in the name of consistency (Thanks Sergey!)
- Link to v4: https://lore.kernel.org/r/20221109-i2c-waive-v4-0-e4496462833b@chromium.org
Changes in v4:
- Rename full_power to do_power_on
- Link to v3: https://lore.kernel.org/r/20221109-i2c-waive-v3-0-d8651cb4b88d@chromium.org
Changes in v3:
- Introduce full_power variable to make more clear what we are doing.
- Link to v2: https://lore.kernel.org/r/20221109-i2c-waive-v2-0-07550bf2dacc@chromium.org
Changes in v2:
- Cover also device remove
- Link to v1: https://lore.kernel.org/r/20221109-i2c-waive-v1-0-ed70a99b990d@chromium.org
---
Ricardo Ribalda (1):
i2c: Restore initial power state when we are done.
drivers/i2c/i2c-core-base.c | 11 +++++++----
include/linux/i2c.h | 4 ++++
2 files changed, 11 insertions(+), 4 deletions(-)
---
base-commit: f141df371335645ce29a87d9683a3f79fba7fd67
change-id: 20221109-i2c-waive-ae97fea1f1b5
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
03ed9ba357cc ("dmaengine: at_hdmac: Fix concurrency over the active list")
6ba826cbb57d ("dmaengine: at_hdmac: Free the memset buf without holding the chan lock")
06988949df8c ("dmaengine: at_hdmac: Fix concurrency over descriptor")
c6babed879fb ("dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all()")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 03ed9ba357cc78116164b90b87f45eacab60b561 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:44 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix concurrency over the active list
The tasklet (atc_advance_work()) did not held the channel lock when
retrieving the first active descriptor, causing concurrency problems if
issue_pending() was called in between. If issue_pending() was called
exactly after the lock was released in the tasklet (atc_advance_work()),
atc_chain_complete() could complete a descriptor for which the controller
has not yet raised an interrupt.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-11-tudor.ambarus@microchip.…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 0fb44f622d35..b53a9fc15dd9 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -462,8 +462,6 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
if (!atc_chan_is_cyclic(atchan))
dma_cookie_complete(txd);
- /* Remove transfer node from the active list. */
- list_del_init(&desc->desc_node);
spin_unlock_irqrestore(&atchan->lock, flags);
dma_descriptor_unmap(txd);
@@ -495,6 +493,7 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
*/
static void atc_advance_work(struct at_dma_chan *atchan)
{
+ struct at_desc *desc;
unsigned long flags;
dev_vdbg(chan2dev(&atchan->chan_common), "advance_work\n");
@@ -502,9 +501,12 @@ static void atc_advance_work(struct at_dma_chan *atchan)
spin_lock_irqsave(&atchan->lock, flags);
if (atc_chan_is_enabled(atchan) || list_empty(&atchan->active_list))
return spin_unlock_irqrestore(&atchan->lock, flags);
- spin_unlock_irqrestore(&atchan->lock, flags);
- atc_chain_complete(atchan, atc_first_active(atchan));
+ desc = atc_first_active(atchan);
+ /* Remove the transfer node from the active list. */
+ list_del_init(&desc->desc_node);
+ spin_unlock_irqrestore(&atchan->lock, flags);
+ atc_chain_complete(atchan, desc);
/* advance work */
spin_lock_irqsave(&atchan->lock, flags);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
03ed9ba357cc ("dmaengine: at_hdmac: Fix concurrency over the active list")
6ba826cbb57d ("dmaengine: at_hdmac: Free the memset buf without holding the chan lock")
06988949df8c ("dmaengine: at_hdmac: Fix concurrency over descriptor")
c6babed879fb ("dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all()")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 03ed9ba357cc78116164b90b87f45eacab60b561 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:44 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix concurrency over the active list
The tasklet (atc_advance_work()) did not held the channel lock when
retrieving the first active descriptor, causing concurrency problems if
issue_pending() was called in between. If issue_pending() was called
exactly after the lock was released in the tasklet (atc_advance_work()),
atc_chain_complete() could complete a descriptor for which the controller
has not yet raised an interrupt.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-11-tudor.ambarus@microchip.…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 0fb44f622d35..b53a9fc15dd9 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -462,8 +462,6 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
if (!atc_chan_is_cyclic(atchan))
dma_cookie_complete(txd);
- /* Remove transfer node from the active list. */
- list_del_init(&desc->desc_node);
spin_unlock_irqrestore(&atchan->lock, flags);
dma_descriptor_unmap(txd);
@@ -495,6 +493,7 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
*/
static void atc_advance_work(struct at_dma_chan *atchan)
{
+ struct at_desc *desc;
unsigned long flags;
dev_vdbg(chan2dev(&atchan->chan_common), "advance_work\n");
@@ -502,9 +501,12 @@ static void atc_advance_work(struct at_dma_chan *atchan)
spin_lock_irqsave(&atchan->lock, flags);
if (atc_chan_is_enabled(atchan) || list_empty(&atchan->active_list))
return spin_unlock_irqrestore(&atchan->lock, flags);
- spin_unlock_irqrestore(&atchan->lock, flags);
- atc_chain_complete(atchan, atc_first_active(atchan));
+ desc = atc_first_active(atchan);
+ /* Remove the transfer node from the active list. */
+ list_del_init(&desc->desc_node);
+ spin_unlock_irqrestore(&atchan->lock, flags);
+ atc_chain_complete(atchan, desc);
/* advance work */
spin_lock_irqsave(&atchan->lock, flags);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
03ed9ba357cc ("dmaengine: at_hdmac: Fix concurrency over the active list")
6ba826cbb57d ("dmaengine: at_hdmac: Free the memset buf without holding the chan lock")
06988949df8c ("dmaengine: at_hdmac: Fix concurrency over descriptor")
c6babed879fb ("dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all()")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 03ed9ba357cc78116164b90b87f45eacab60b561 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:44 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix concurrency over the active list
The tasklet (atc_advance_work()) did not held the channel lock when
retrieving the first active descriptor, causing concurrency problems if
issue_pending() was called in between. If issue_pending() was called
exactly after the lock was released in the tasklet (atc_advance_work()),
atc_chain_complete() could complete a descriptor for which the controller
has not yet raised an interrupt.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-11-tudor.ambarus@microchip.…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 0fb44f622d35..b53a9fc15dd9 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -462,8 +462,6 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
if (!atc_chan_is_cyclic(atchan))
dma_cookie_complete(txd);
- /* Remove transfer node from the active list. */
- list_del_init(&desc->desc_node);
spin_unlock_irqrestore(&atchan->lock, flags);
dma_descriptor_unmap(txd);
@@ -495,6 +493,7 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
*/
static void atc_advance_work(struct at_dma_chan *atchan)
{
+ struct at_desc *desc;
unsigned long flags;
dev_vdbg(chan2dev(&atchan->chan_common), "advance_work\n");
@@ -502,9 +501,12 @@ static void atc_advance_work(struct at_dma_chan *atchan)
spin_lock_irqsave(&atchan->lock, flags);
if (atc_chan_is_enabled(atchan) || list_empty(&atchan->active_list))
return spin_unlock_irqrestore(&atchan->lock, flags);
- spin_unlock_irqrestore(&atchan->lock, flags);
- atc_chain_complete(atchan, atc_first_active(atchan));
+ desc = atc_first_active(atchan);
+ /* Remove the transfer node from the active list. */
+ list_del_init(&desc->desc_node);
+ spin_unlock_irqrestore(&atchan->lock, flags);
+ atc_chain_complete(atchan, desc);
/* advance work */
spin_lock_irqsave(&atchan->lock, flags);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
03ed9ba357cc ("dmaengine: at_hdmac: Fix concurrency over the active list")
6ba826cbb57d ("dmaengine: at_hdmac: Free the memset buf without holding the chan lock")
06988949df8c ("dmaengine: at_hdmac: Fix concurrency over descriptor")
c6babed879fb ("dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all()")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 03ed9ba357cc78116164b90b87f45eacab60b561 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:44 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix concurrency over the active list
The tasklet (atc_advance_work()) did not held the channel lock when
retrieving the first active descriptor, causing concurrency problems if
issue_pending() was called in between. If issue_pending() was called
exactly after the lock was released in the tasklet (atc_advance_work()),
atc_chain_complete() could complete a descriptor for which the controller
has not yet raised an interrupt.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-11-tudor.ambarus@microchip.…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 0fb44f622d35..b53a9fc15dd9 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -462,8 +462,6 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
if (!atc_chan_is_cyclic(atchan))
dma_cookie_complete(txd);
- /* Remove transfer node from the active list. */
- list_del_init(&desc->desc_node);
spin_unlock_irqrestore(&atchan->lock, flags);
dma_descriptor_unmap(txd);
@@ -495,6 +493,7 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
*/
static void atc_advance_work(struct at_dma_chan *atchan)
{
+ struct at_desc *desc;
unsigned long flags;
dev_vdbg(chan2dev(&atchan->chan_common), "advance_work\n");
@@ -502,9 +501,12 @@ static void atc_advance_work(struct at_dma_chan *atchan)
spin_lock_irqsave(&atchan->lock, flags);
if (atc_chan_is_enabled(atchan) || list_empty(&atchan->active_list))
return spin_unlock_irqrestore(&atchan->lock, flags);
- spin_unlock_irqrestore(&atchan->lock, flags);
- atc_chain_complete(atchan, atc_first_active(atchan));
+ desc = atc_first_active(atchan);
+ /* Remove the transfer node from the active list. */
+ list_del_init(&desc->desc_node);
+ spin_unlock_irqrestore(&atchan->lock, flags);
+ atc_chain_complete(atchan, desc);
/* advance work */
spin_lock_irqsave(&atchan->lock, flags);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
6ba826cbb57d ("dmaengine: at_hdmac: Free the memset buf without holding the chan lock")
06988949df8c ("dmaengine: at_hdmac: Fix concurrency over descriptor")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6ba826cbb57d675f447b59323204d1473bbd5593 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:43 +0300
Subject: [PATCH] dmaengine: at_hdmac: Free the memset buf without holding the
chan lock
There's no need to hold the channel lock when freeing the memset buf, as
the operation has already completed. Free the memset buf without holding
the channel lock.
Fixes: 4d112426c344 ("dmaengine: hdmac: Add memset capabilities")
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-10-tudor.ambarus@microchip.…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 2012ecc57826..0fb44f622d35 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -462,13 +462,6 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
if (!atc_chan_is_cyclic(atchan))
dma_cookie_complete(txd);
- /* If the transfer was a memset, free our temporary buffer */
- if (desc->memset_buffer) {
- dma_pool_free(atdma->memset_pool, desc->memset_vaddr,
- desc->memset_paddr);
- desc->memset_buffer = false;
- }
-
/* Remove transfer node from the active list. */
list_del_init(&desc->desc_node);
spin_unlock_irqrestore(&atchan->lock, flags);
@@ -487,6 +480,13 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
/* add myself to free_list */
list_add(&desc->desc_node, &atchan->free_list);
spin_unlock_irqrestore(&atchan->lock, flags);
+
+ /* If the transfer was a memset, free our temporary buffer */
+ if (desc->memset_buffer) {
+ dma_pool_free(atdma->memset_pool, desc->memset_vaddr,
+ desc->memset_paddr);
+ desc->memset_buffer = false;
+ }
}
/**
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
6ba826cbb57d ("dmaengine: at_hdmac: Free the memset buf without holding the chan lock")
06988949df8c ("dmaengine: at_hdmac: Fix concurrency over descriptor")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6ba826cbb57d675f447b59323204d1473bbd5593 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:43 +0300
Subject: [PATCH] dmaengine: at_hdmac: Free the memset buf without holding the
chan lock
There's no need to hold the channel lock when freeing the memset buf, as
the operation has already completed. Free the memset buf without holding
the channel lock.
Fixes: 4d112426c344 ("dmaengine: hdmac: Add memset capabilities")
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-10-tudor.ambarus@microchip.…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 2012ecc57826..0fb44f622d35 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -462,13 +462,6 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
if (!atc_chan_is_cyclic(atchan))
dma_cookie_complete(txd);
- /* If the transfer was a memset, free our temporary buffer */
- if (desc->memset_buffer) {
- dma_pool_free(atdma->memset_pool, desc->memset_vaddr,
- desc->memset_paddr);
- desc->memset_buffer = false;
- }
-
/* Remove transfer node from the active list. */
list_del_init(&desc->desc_node);
spin_unlock_irqrestore(&atchan->lock, flags);
@@ -487,6 +480,13 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
/* add myself to free_list */
list_add(&desc->desc_node, &atchan->free_list);
spin_unlock_irqrestore(&atchan->lock, flags);
+
+ /* If the transfer was a memset, free our temporary buffer */
+ if (desc->memset_buffer) {
+ dma_pool_free(atdma->memset_pool, desc->memset_vaddr,
+ desc->memset_paddr);
+ desc->memset_buffer = false;
+ }
}
/**
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
6ba826cbb57d ("dmaengine: at_hdmac: Free the memset buf without holding the chan lock")
06988949df8c ("dmaengine: at_hdmac: Fix concurrency over descriptor")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6ba826cbb57d675f447b59323204d1473bbd5593 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:43 +0300
Subject: [PATCH] dmaengine: at_hdmac: Free the memset buf without holding the
chan lock
There's no need to hold the channel lock when freeing the memset buf, as
the operation has already completed. Free the memset buf without holding
the channel lock.
Fixes: 4d112426c344 ("dmaengine: hdmac: Add memset capabilities")
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-10-tudor.ambarus@microchip.…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 2012ecc57826..0fb44f622d35 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -462,13 +462,6 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
if (!atc_chan_is_cyclic(atchan))
dma_cookie_complete(txd);
- /* If the transfer was a memset, free our temporary buffer */
- if (desc->memset_buffer) {
- dma_pool_free(atdma->memset_pool, desc->memset_vaddr,
- desc->memset_paddr);
- desc->memset_buffer = false;
- }
-
/* Remove transfer node from the active list. */
list_del_init(&desc->desc_node);
spin_unlock_irqrestore(&atchan->lock, flags);
@@ -487,6 +480,13 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
/* add myself to free_list */
list_add(&desc->desc_node, &atchan->free_list);
spin_unlock_irqrestore(&atchan->lock, flags);
+
+ /* If the transfer was a memset, free our temporary buffer */
+ if (desc->memset_buffer) {
+ dma_pool_free(atdma->memset_pool, desc->memset_vaddr,
+ desc->memset_paddr);
+ desc->memset_buffer = false;
+ }
}
/**
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
6ba826cbb57d ("dmaengine: at_hdmac: Free the memset buf without holding the chan lock")
06988949df8c ("dmaengine: at_hdmac: Fix concurrency over descriptor")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6ba826cbb57d675f447b59323204d1473bbd5593 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:43 +0300
Subject: [PATCH] dmaengine: at_hdmac: Free the memset buf without holding the
chan lock
There's no need to hold the channel lock when freeing the memset buf, as
the operation has already completed. Free the memset buf without holding
the channel lock.
Fixes: 4d112426c344 ("dmaengine: hdmac: Add memset capabilities")
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-10-tudor.ambarus@microchip.…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 2012ecc57826..0fb44f622d35 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -462,13 +462,6 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
if (!atc_chan_is_cyclic(atchan))
dma_cookie_complete(txd);
- /* If the transfer was a memset, free our temporary buffer */
- if (desc->memset_buffer) {
- dma_pool_free(atdma->memset_pool, desc->memset_vaddr,
- desc->memset_paddr);
- desc->memset_buffer = false;
- }
-
/* Remove transfer node from the active list. */
list_del_init(&desc->desc_node);
spin_unlock_irqrestore(&atchan->lock, flags);
@@ -487,6 +480,13 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
/* add myself to free_list */
list_add(&desc->desc_node, &atchan->free_list);
spin_unlock_irqrestore(&atchan->lock, flags);
+
+ /* If the transfer was a memset, free our temporary buffer */
+ if (desc->memset_buffer) {
+ dma_pool_free(atdma->memset_pool, desc->memset_vaddr,
+ desc->memset_paddr);
+ desc->memset_buffer = false;
+ }
}
/**
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
06988949df8c ("dmaengine: at_hdmac: Fix concurrency over descriptor")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 06988949df8c3007ad82036d3606d8ae72ed9000 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:42 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix concurrency over descriptor
The descriptor was added to the free_list before calling the callback,
which could result in reissuing of the same descriptor and calling of a
single callback for both. Move the decriptor to the free list after the
callback is invoked.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-9-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index f1e6fa6af6c2..2012ecc57826 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -469,11 +469,8 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
desc->memset_buffer = false;
}
- /* move children to free_list */
- list_splice_init(&desc->tx_list, &atchan->free_list);
- /* move myself to free_list */
- list_move(&desc->desc_node, &atchan->free_list);
-
+ /* Remove transfer node from the active list. */
+ list_del_init(&desc->desc_node);
spin_unlock_irqrestore(&atchan->lock, flags);
dma_descriptor_unmap(txd);
@@ -483,6 +480,13 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
dmaengine_desc_get_callback_invoke(txd, NULL);
dma_run_dependencies(txd);
+
+ spin_lock_irqsave(&atchan->lock, flags);
+ /* move children to free_list */
+ list_splice_init(&desc->tx_list, &atchan->free_list);
+ /* add myself to free_list */
+ list_add(&desc->desc_node, &atchan->free_list);
+ spin_unlock_irqrestore(&atchan->lock, flags);
}
/**
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
06988949df8c ("dmaengine: at_hdmac: Fix concurrency over descriptor")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 06988949df8c3007ad82036d3606d8ae72ed9000 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:42 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix concurrency over descriptor
The descriptor was added to the free_list before calling the callback,
which could result in reissuing of the same descriptor and calling of a
single callback for both. Move the decriptor to the free list after the
callback is invoked.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-9-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index f1e6fa6af6c2..2012ecc57826 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -469,11 +469,8 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
desc->memset_buffer = false;
}
- /* move children to free_list */
- list_splice_init(&desc->tx_list, &atchan->free_list);
- /* move myself to free_list */
- list_move(&desc->desc_node, &atchan->free_list);
-
+ /* Remove transfer node from the active list. */
+ list_del_init(&desc->desc_node);
spin_unlock_irqrestore(&atchan->lock, flags);
dma_descriptor_unmap(txd);
@@ -483,6 +480,13 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
dmaengine_desc_get_callback_invoke(txd, NULL);
dma_run_dependencies(txd);
+
+ spin_lock_irqsave(&atchan->lock, flags);
+ /* move children to free_list */
+ list_splice_init(&desc->tx_list, &atchan->free_list);
+ /* add myself to free_list */
+ list_add(&desc->desc_node, &atchan->free_list);
+ spin_unlock_irqrestore(&atchan->lock, flags);
}
/**
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
06988949df8c ("dmaengine: at_hdmac: Fix concurrency over descriptor")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 06988949df8c3007ad82036d3606d8ae72ed9000 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:42 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix concurrency over descriptor
The descriptor was added to the free_list before calling the callback,
which could result in reissuing of the same descriptor and calling of a
single callback for both. Move the decriptor to the free list after the
callback is invoked.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-9-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index f1e6fa6af6c2..2012ecc57826 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -469,11 +469,8 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
desc->memset_buffer = false;
}
- /* move children to free_list */
- list_splice_init(&desc->tx_list, &atchan->free_list);
- /* move myself to free_list */
- list_move(&desc->desc_node, &atchan->free_list);
-
+ /* Remove transfer node from the active list. */
+ list_del_init(&desc->desc_node);
spin_unlock_irqrestore(&atchan->lock, flags);
dma_descriptor_unmap(txd);
@@ -483,6 +480,13 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
dmaengine_desc_get_callback_invoke(txd, NULL);
dma_run_dependencies(txd);
+
+ spin_lock_irqsave(&atchan->lock, flags);
+ /* move children to free_list */
+ list_splice_init(&desc->tx_list, &atchan->free_list);
+ /* add myself to free_list */
+ list_add(&desc->desc_node, &atchan->free_list);
+ spin_unlock_irqrestore(&atchan->lock, flags);
}
/**
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
06988949df8c ("dmaengine: at_hdmac: Fix concurrency over descriptor")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 06988949df8c3007ad82036d3606d8ae72ed9000 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:42 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix concurrency over descriptor
The descriptor was added to the free_list before calling the callback,
which could result in reissuing of the same descriptor and calling of a
single callback for both. Move the decriptor to the free list after the
callback is invoked.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-9-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index f1e6fa6af6c2..2012ecc57826 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -469,11 +469,8 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
desc->memset_buffer = false;
}
- /* move children to free_list */
- list_splice_init(&desc->tx_list, &atchan->free_list);
- /* move myself to free_list */
- list_move(&desc->desc_node, &atchan->free_list);
-
+ /* Remove transfer node from the active list. */
+ list_del_init(&desc->desc_node);
spin_unlock_irqrestore(&atchan->lock, flags);
dma_descriptor_unmap(txd);
@@ -483,6 +480,13 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
dmaengine_desc_get_callback_invoke(txd, NULL);
dma_run_dependencies(txd);
+
+ spin_lock_irqsave(&atchan->lock, flags);
+ /* move children to free_list */
+ list_splice_init(&desc->tx_list, &atchan->free_list);
+ /* add myself to free_list */
+ list_add(&desc->desc_node, &atchan->free_list);
+ spin_unlock_irqrestore(&atchan->lock, flags);
}
/**
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
c6babed879fb ("dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all()")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c6babed879fbe82796a601bf097649e07382db46 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:41 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix concurrency problems by removing
atc_complete_all()
atc_complete_all() had concurrency bugs, thus remove it:
1/ atc_complete_all() in its entirety was buggy, as when the atchan->queue
list (the one that contains descriptors that are not yet issued to the
hardware) contained descriptors, it fired just the first from the
atchan->queue, but moved all the desc from atchan->queue to
atchan->active_list and considered them all as fired. This could result in
calling the completion of a descriptor that was not yet issued to the
hardware.
2/ when in tasklet at atc_advance_work() time, atchan->active_list was
queried without holding the lock of the chan. This can result in
atchan->active_list concurrency problems between the tasklet and
issue_pending().
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-8-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index deb4c6027436..f1e6fa6af6c2 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -485,42 +485,6 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
dma_run_dependencies(txd);
}
-/**
- * atc_complete_all - finish work for all transactions
- * @atchan: channel to complete transactions for
- *
- * Eventually submit queued descriptors if any
- *
- * Assume channel is idle while calling this function
- * Called with atchan->lock held and bh disabled
- */
-static void atc_complete_all(struct at_dma_chan *atchan)
-{
- struct at_desc *desc, *_desc;
- LIST_HEAD(list);
- unsigned long flags;
-
- dev_vdbg(chan2dev(&atchan->chan_common), "complete all\n");
-
- spin_lock_irqsave(&atchan->lock, flags);
-
- /*
- * Submit queued descriptors ASAP, i.e. before we go through
- * the completed ones.
- */
- if (!list_empty(&atchan->queue))
- atc_dostart(atchan, atc_first_queued(atchan));
- /* empty active_list now it is completed */
- list_splice_init(&atchan->active_list, &list);
- /* empty queue list by moving descriptors (if any) to active_list */
- list_splice_init(&atchan->queue, &atchan->active_list);
-
- spin_unlock_irqrestore(&atchan->lock, flags);
-
- list_for_each_entry_safe(desc, _desc, &list, desc_node)
- atc_chain_complete(atchan, desc);
-}
-
/**
* atc_advance_work - at the end of a transaction, move forward
* @atchan: channel where the transaction ended
@@ -528,25 +492,20 @@ static void atc_complete_all(struct at_dma_chan *atchan)
static void atc_advance_work(struct at_dma_chan *atchan)
{
unsigned long flags;
- int ret;
dev_vdbg(chan2dev(&atchan->chan_common), "advance_work\n");
spin_lock_irqsave(&atchan->lock, flags);
- ret = atc_chan_is_enabled(atchan);
+ if (atc_chan_is_enabled(atchan) || list_empty(&atchan->active_list))
+ return spin_unlock_irqrestore(&atchan->lock, flags);
spin_unlock_irqrestore(&atchan->lock, flags);
- if (ret)
- return;
-
- if (list_empty(&atchan->active_list) ||
- list_is_singular(&atchan->active_list))
- return atc_complete_all(atchan);
atc_chain_complete(atchan, atc_first_active(atchan));
/* advance work */
spin_lock_irqsave(&atchan->lock, flags);
- atc_dostart(atchan, atc_first_active(atchan));
+ if (!list_empty(&atchan->active_list))
+ atc_dostart(atchan, atc_first_active(atchan));
spin_unlock_irqrestore(&atchan->lock, flags);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
c6babed879fb ("dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all()")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c6babed879fbe82796a601bf097649e07382db46 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:41 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix concurrency problems by removing
atc_complete_all()
atc_complete_all() had concurrency bugs, thus remove it:
1/ atc_complete_all() in its entirety was buggy, as when the atchan->queue
list (the one that contains descriptors that are not yet issued to the
hardware) contained descriptors, it fired just the first from the
atchan->queue, but moved all the desc from atchan->queue to
atchan->active_list and considered them all as fired. This could result in
calling the completion of a descriptor that was not yet issued to the
hardware.
2/ when in tasklet at atc_advance_work() time, atchan->active_list was
queried without holding the lock of the chan. This can result in
atchan->active_list concurrency problems between the tasklet and
issue_pending().
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-8-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index deb4c6027436..f1e6fa6af6c2 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -485,42 +485,6 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
dma_run_dependencies(txd);
}
-/**
- * atc_complete_all - finish work for all transactions
- * @atchan: channel to complete transactions for
- *
- * Eventually submit queued descriptors if any
- *
- * Assume channel is idle while calling this function
- * Called with atchan->lock held and bh disabled
- */
-static void atc_complete_all(struct at_dma_chan *atchan)
-{
- struct at_desc *desc, *_desc;
- LIST_HEAD(list);
- unsigned long flags;
-
- dev_vdbg(chan2dev(&atchan->chan_common), "complete all\n");
-
- spin_lock_irqsave(&atchan->lock, flags);
-
- /*
- * Submit queued descriptors ASAP, i.e. before we go through
- * the completed ones.
- */
- if (!list_empty(&atchan->queue))
- atc_dostart(atchan, atc_first_queued(atchan));
- /* empty active_list now it is completed */
- list_splice_init(&atchan->active_list, &list);
- /* empty queue list by moving descriptors (if any) to active_list */
- list_splice_init(&atchan->queue, &atchan->active_list);
-
- spin_unlock_irqrestore(&atchan->lock, flags);
-
- list_for_each_entry_safe(desc, _desc, &list, desc_node)
- atc_chain_complete(atchan, desc);
-}
-
/**
* atc_advance_work - at the end of a transaction, move forward
* @atchan: channel where the transaction ended
@@ -528,25 +492,20 @@ static void atc_complete_all(struct at_dma_chan *atchan)
static void atc_advance_work(struct at_dma_chan *atchan)
{
unsigned long flags;
- int ret;
dev_vdbg(chan2dev(&atchan->chan_common), "advance_work\n");
spin_lock_irqsave(&atchan->lock, flags);
- ret = atc_chan_is_enabled(atchan);
+ if (atc_chan_is_enabled(atchan) || list_empty(&atchan->active_list))
+ return spin_unlock_irqrestore(&atchan->lock, flags);
spin_unlock_irqrestore(&atchan->lock, flags);
- if (ret)
- return;
-
- if (list_empty(&atchan->active_list) ||
- list_is_singular(&atchan->active_list))
- return atc_complete_all(atchan);
atc_chain_complete(atchan, atc_first_active(atchan));
/* advance work */
spin_lock_irqsave(&atchan->lock, flags);
- atc_dostart(atchan, atc_first_active(atchan));
+ if (!list_empty(&atchan->active_list))
+ atc_dostart(atchan, atc_first_active(atchan));
spin_unlock_irqrestore(&atchan->lock, flags);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
c6babed879fb ("dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all()")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c6babed879fbe82796a601bf097649e07382db46 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:41 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix concurrency problems by removing
atc_complete_all()
atc_complete_all() had concurrency bugs, thus remove it:
1/ atc_complete_all() in its entirety was buggy, as when the atchan->queue
list (the one that contains descriptors that are not yet issued to the
hardware) contained descriptors, it fired just the first from the
atchan->queue, but moved all the desc from atchan->queue to
atchan->active_list and considered them all as fired. This could result in
calling the completion of a descriptor that was not yet issued to the
hardware.
2/ when in tasklet at atc_advance_work() time, atchan->active_list was
queried without holding the lock of the chan. This can result in
atchan->active_list concurrency problems between the tasklet and
issue_pending().
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-8-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index deb4c6027436..f1e6fa6af6c2 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -485,42 +485,6 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
dma_run_dependencies(txd);
}
-/**
- * atc_complete_all - finish work for all transactions
- * @atchan: channel to complete transactions for
- *
- * Eventually submit queued descriptors if any
- *
- * Assume channel is idle while calling this function
- * Called with atchan->lock held and bh disabled
- */
-static void atc_complete_all(struct at_dma_chan *atchan)
-{
- struct at_desc *desc, *_desc;
- LIST_HEAD(list);
- unsigned long flags;
-
- dev_vdbg(chan2dev(&atchan->chan_common), "complete all\n");
-
- spin_lock_irqsave(&atchan->lock, flags);
-
- /*
- * Submit queued descriptors ASAP, i.e. before we go through
- * the completed ones.
- */
- if (!list_empty(&atchan->queue))
- atc_dostart(atchan, atc_first_queued(atchan));
- /* empty active_list now it is completed */
- list_splice_init(&atchan->active_list, &list);
- /* empty queue list by moving descriptors (if any) to active_list */
- list_splice_init(&atchan->queue, &atchan->active_list);
-
- spin_unlock_irqrestore(&atchan->lock, flags);
-
- list_for_each_entry_safe(desc, _desc, &list, desc_node)
- atc_chain_complete(atchan, desc);
-}
-
/**
* atc_advance_work - at the end of a transaction, move forward
* @atchan: channel where the transaction ended
@@ -528,25 +492,20 @@ static void atc_complete_all(struct at_dma_chan *atchan)
static void atc_advance_work(struct at_dma_chan *atchan)
{
unsigned long flags;
- int ret;
dev_vdbg(chan2dev(&atchan->chan_common), "advance_work\n");
spin_lock_irqsave(&atchan->lock, flags);
- ret = atc_chan_is_enabled(atchan);
+ if (atc_chan_is_enabled(atchan) || list_empty(&atchan->active_list))
+ return spin_unlock_irqrestore(&atchan->lock, flags);
spin_unlock_irqrestore(&atchan->lock, flags);
- if (ret)
- return;
-
- if (list_empty(&atchan->active_list) ||
- list_is_singular(&atchan->active_list))
- return atc_complete_all(atchan);
atc_chain_complete(atchan, atc_first_active(atchan));
/* advance work */
spin_lock_irqsave(&atchan->lock, flags);
- atc_dostart(atchan, atc_first_active(atchan));
+ if (!list_empty(&atchan->active_list))
+ atc_dostart(atchan, atc_first_active(atchan));
spin_unlock_irqrestore(&atchan->lock, flags);
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
c6babed879fb ("dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all()")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c6babed879fbe82796a601bf097649e07382db46 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:41 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix concurrency problems by removing
atc_complete_all()
atc_complete_all() had concurrency bugs, thus remove it:
1/ atc_complete_all() in its entirety was buggy, as when the atchan->queue
list (the one that contains descriptors that are not yet issued to the
hardware) contained descriptors, it fired just the first from the
atchan->queue, but moved all the desc from atchan->queue to
atchan->active_list and considered them all as fired. This could result in
calling the completion of a descriptor that was not yet issued to the
hardware.
2/ when in tasklet at atc_advance_work() time, atchan->active_list was
queried without holding the lock of the chan. This can result in
atchan->active_list concurrency problems between the tasklet and
issue_pending().
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-8-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index deb4c6027436..f1e6fa6af6c2 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -485,42 +485,6 @@ atc_chain_complete(struct at_dma_chan *atchan, struct at_desc *desc)
dma_run_dependencies(txd);
}
-/**
- * atc_complete_all - finish work for all transactions
- * @atchan: channel to complete transactions for
- *
- * Eventually submit queued descriptors if any
- *
- * Assume channel is idle while calling this function
- * Called with atchan->lock held and bh disabled
- */
-static void atc_complete_all(struct at_dma_chan *atchan)
-{
- struct at_desc *desc, *_desc;
- LIST_HEAD(list);
- unsigned long flags;
-
- dev_vdbg(chan2dev(&atchan->chan_common), "complete all\n");
-
- spin_lock_irqsave(&atchan->lock, flags);
-
- /*
- * Submit queued descriptors ASAP, i.e. before we go through
- * the completed ones.
- */
- if (!list_empty(&atchan->queue))
- atc_dostart(atchan, atc_first_queued(atchan));
- /* empty active_list now it is completed */
- list_splice_init(&atchan->active_list, &list);
- /* empty queue list by moving descriptors (if any) to active_list */
- list_splice_init(&atchan->queue, &atchan->active_list);
-
- spin_unlock_irqrestore(&atchan->lock, flags);
-
- list_for_each_entry_safe(desc, _desc, &list, desc_node)
- atc_chain_complete(atchan, desc);
-}
-
/**
* atc_advance_work - at the end of a transaction, move forward
* @atchan: channel where the transaction ended
@@ -528,25 +492,20 @@ static void atc_complete_all(struct at_dma_chan *atchan)
static void atc_advance_work(struct at_dma_chan *atchan)
{
unsigned long flags;
- int ret;
dev_vdbg(chan2dev(&atchan->chan_common), "advance_work\n");
spin_lock_irqsave(&atchan->lock, flags);
- ret = atc_chan_is_enabled(atchan);
+ if (atc_chan_is_enabled(atchan) || list_empty(&atchan->active_list))
+ return spin_unlock_irqrestore(&atchan->lock, flags);
spin_unlock_irqrestore(&atchan->lock, flags);
- if (ret)
- return;
-
- if (list_empty(&atchan->active_list) ||
- list_is_singular(&atchan->active_list))
- return atc_complete_all(atchan);
atc_chain_complete(atchan, atc_first_active(atchan));
/* advance work */
spin_lock_irqsave(&atchan->lock, flags);
- atc_dostart(atchan, atc_first_active(atchan));
+ if (!list_empty(&atchan->active_list))
+ atc_dostart(atchan, atc_first_active(atchan));
spin_unlock_irqrestore(&atchan->lock, flags);
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f645f85ae110 ("dmaengine: at_hdmac: Do not call the complete callback on device_terminate_all")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f645f85ae1104f8bd882f962ac0a69a1070076dd Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:39 +0300
Subject: [PATCH] dmaengine: at_hdmac: Do not call the complete callback on
device_terminate_all
The method was wrong because it violated the dmaengine API. For aborted
transfers the complete callback should not be called. Fix the behavior and
do not call the complete callback on device_terminate_all.
Fixes: 808347f6a317 ("dmaengine: at_hdmac: add DMA slave transfers")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-6-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index cb5522417db6..11816484843e 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -1437,11 +1437,8 @@ static int atc_terminate_all(struct dma_chan *chan)
struct at_dma_chan *atchan = to_at_dma_chan(chan);
struct at_dma *atdma = to_at_dma(chan->device);
int chan_id = atchan->chan_common.chan_id;
- struct at_desc *desc, *_desc;
unsigned long flags;
- LIST_HEAD(list);
-
dev_vdbg(chan2dev(chan), "%s\n", __func__);
/*
@@ -1460,15 +1457,11 @@ static int atc_terminate_all(struct dma_chan *chan)
cpu_relax();
/* active_list entries will end up before queued entries */
- list_splice_init(&atchan->queue, &list);
- list_splice_init(&atchan->active_list, &list);
+ list_splice_tail_init(&atchan->queue, &atchan->free_list);
+ list_splice_tail_init(&atchan->active_list, &atchan->free_list);
spin_unlock_irqrestore(&atchan->lock, flags);
- /* Flush all pending and queued descriptors */
- list_for_each_entry_safe(desc, _desc, &list, desc_node)
- atc_chain_complete(atchan, desc);
-
clear_bit(ATC_IS_PAUSED, &atchan->status);
/* if channel dedicated to cyclic operations, free it */
clear_bit(ATC_IS_CYCLIC, &atchan->status);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f645f85ae110 ("dmaengine: at_hdmac: Do not call the complete callback on device_terminate_all")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f645f85ae1104f8bd882f962ac0a69a1070076dd Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:39 +0300
Subject: [PATCH] dmaengine: at_hdmac: Do not call the complete callback on
device_terminate_all
The method was wrong because it violated the dmaengine API. For aborted
transfers the complete callback should not be called. Fix the behavior and
do not call the complete callback on device_terminate_all.
Fixes: 808347f6a317 ("dmaengine: at_hdmac: add DMA slave transfers")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-6-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index cb5522417db6..11816484843e 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -1437,11 +1437,8 @@ static int atc_terminate_all(struct dma_chan *chan)
struct at_dma_chan *atchan = to_at_dma_chan(chan);
struct at_dma *atdma = to_at_dma(chan->device);
int chan_id = atchan->chan_common.chan_id;
- struct at_desc *desc, *_desc;
unsigned long flags;
- LIST_HEAD(list);
-
dev_vdbg(chan2dev(chan), "%s\n", __func__);
/*
@@ -1460,15 +1457,11 @@ static int atc_terminate_all(struct dma_chan *chan)
cpu_relax();
/* active_list entries will end up before queued entries */
- list_splice_init(&atchan->queue, &list);
- list_splice_init(&atchan->active_list, &list);
+ list_splice_tail_init(&atchan->queue, &atchan->free_list);
+ list_splice_tail_init(&atchan->active_list, &atchan->free_list);
spin_unlock_irqrestore(&atchan->lock, flags);
- /* Flush all pending and queued descriptors */
- list_for_each_entry_safe(desc, _desc, &list, desc_node)
- atc_chain_complete(atchan, desc);
-
clear_bit(ATC_IS_PAUSED, &atchan->status);
/* if channel dedicated to cyclic operations, free it */
clear_bit(ATC_IS_CYCLIC, &atchan->status);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f645f85ae110 ("dmaengine: at_hdmac: Do not call the complete callback on device_terminate_all")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f645f85ae1104f8bd882f962ac0a69a1070076dd Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:39 +0300
Subject: [PATCH] dmaengine: at_hdmac: Do not call the complete callback on
device_terminate_all
The method was wrong because it violated the dmaengine API. For aborted
transfers the complete callback should not be called. Fix the behavior and
do not call the complete callback on device_terminate_all.
Fixes: 808347f6a317 ("dmaengine: at_hdmac: add DMA slave transfers")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-6-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index cb5522417db6..11816484843e 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -1437,11 +1437,8 @@ static int atc_terminate_all(struct dma_chan *chan)
struct at_dma_chan *atchan = to_at_dma_chan(chan);
struct at_dma *atdma = to_at_dma(chan->device);
int chan_id = atchan->chan_common.chan_id;
- struct at_desc *desc, *_desc;
unsigned long flags;
- LIST_HEAD(list);
-
dev_vdbg(chan2dev(chan), "%s\n", __func__);
/*
@@ -1460,15 +1457,11 @@ static int atc_terminate_all(struct dma_chan *chan)
cpu_relax();
/* active_list entries will end up before queued entries */
- list_splice_init(&atchan->queue, &list);
- list_splice_init(&atchan->active_list, &list);
+ list_splice_tail_init(&atchan->queue, &atchan->free_list);
+ list_splice_tail_init(&atchan->active_list, &atchan->free_list);
spin_unlock_irqrestore(&atchan->lock, flags);
- /* Flush all pending and queued descriptors */
- list_for_each_entry_safe(desc, _desc, &list, desc_node)
- atc_chain_complete(atchan, desc);
-
clear_bit(ATC_IS_PAUSED, &atchan->status);
/* if channel dedicated to cyclic operations, free it */
clear_bit(ATC_IS_CYCLIC, &atchan->status);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f645f85ae110 ("dmaengine: at_hdmac: Do not call the complete callback on device_terminate_all")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f645f85ae1104f8bd882f962ac0a69a1070076dd Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:39 +0300
Subject: [PATCH] dmaengine: at_hdmac: Do not call the complete callback on
device_terminate_all
The method was wrong because it violated the dmaengine API. For aborted
transfers the complete callback should not be called. Fix the behavior and
do not call the complete callback on device_terminate_all.
Fixes: 808347f6a317 ("dmaengine: at_hdmac: add DMA slave transfers")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-6-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index cb5522417db6..11816484843e 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -1437,11 +1437,8 @@ static int atc_terminate_all(struct dma_chan *chan)
struct at_dma_chan *atchan = to_at_dma_chan(chan);
struct at_dma *atdma = to_at_dma(chan->device);
int chan_id = atchan->chan_common.chan_id;
- struct at_desc *desc, *_desc;
unsigned long flags;
- LIST_HEAD(list);
-
dev_vdbg(chan2dev(chan), "%s\n", __func__);
/*
@@ -1460,15 +1457,11 @@ static int atc_terminate_all(struct dma_chan *chan)
cpu_relax();
/* active_list entries will end up before queued entries */
- list_splice_init(&atchan->queue, &list);
- list_splice_init(&atchan->active_list, &list);
+ list_splice_tail_init(&atchan->queue, &atchan->free_list);
+ list_splice_tail_init(&atchan->active_list, &atchan->free_list);
spin_unlock_irqrestore(&atchan->lock, flags);
- /* Flush all pending and queued descriptors */
- list_for_each_entry_safe(desc, _desc, &list, desc_node)
- atc_chain_complete(atchan, desc);
-
clear_bit(ATC_IS_PAUSED, &atchan->status);
/* if channel dedicated to cyclic operations, free it */
clear_bit(ATC_IS_CYCLIC, &atchan->status);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
fcd37565efda ("dmaengine: at_hdmac: Fix premature completion of desc in issue_pending")
8a47221fc284 ("dmaengine: at_hdmac: Start transfer for cyclic channels in issue_pending")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fcd37565efdaffeac179d0f0ce980ac79bfdf569 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:38 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix premature completion of desc in
issue_pending
Multiple calls to atc_issue_pending() could result in a premature
completion of a descriptor from the atchan->active list, as the method
always completed the first active descriptor from the list. Instead,
issue_pending() should just take the first transaction descriptor from the
pending queue, move it to active_list and start the transfer.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-5-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index e9d0c3632868..cb5522417db6 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -1527,16 +1527,26 @@ atc_tx_status(struct dma_chan *chan,
}
/**
- * atc_issue_pending - try to finish work
+ * atc_issue_pending - takes the first transaction descriptor in the pending
+ * queue and starts the transfer.
* @chan: target DMA channel
*/
static void atc_issue_pending(struct dma_chan *chan)
{
- struct at_dma_chan *atchan = to_at_dma_chan(chan);
+ struct at_dma_chan *atchan = to_at_dma_chan(chan);
+ struct at_desc *desc;
+ unsigned long flags;
dev_vdbg(chan2dev(chan), "issue_pending\n");
- atc_advance_work(atchan);
+ spin_lock_irqsave(&atchan->lock, flags);
+ if (atc_chan_is_enabled(atchan) || list_empty(&atchan->queue))
+ return spin_unlock_irqrestore(&atchan->lock, flags);
+
+ desc = atc_first_queued(atchan);
+ list_move_tail(&desc->desc_node, &atchan->active_list);
+ atc_dostart(atchan, desc);
+ spin_unlock_irqrestore(&atchan->lock, flags);
}
/**
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
fcd37565efda ("dmaengine: at_hdmac: Fix premature completion of desc in issue_pending")
8a47221fc284 ("dmaengine: at_hdmac: Start transfer for cyclic channels in issue_pending")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fcd37565efdaffeac179d0f0ce980ac79bfdf569 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:38 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix premature completion of desc in
issue_pending
Multiple calls to atc_issue_pending() could result in a premature
completion of a descriptor from the atchan->active list, as the method
always completed the first active descriptor from the list. Instead,
issue_pending() should just take the first transaction descriptor from the
pending queue, move it to active_list and start the transfer.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-5-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index e9d0c3632868..cb5522417db6 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -1527,16 +1527,26 @@ atc_tx_status(struct dma_chan *chan,
}
/**
- * atc_issue_pending - try to finish work
+ * atc_issue_pending - takes the first transaction descriptor in the pending
+ * queue and starts the transfer.
* @chan: target DMA channel
*/
static void atc_issue_pending(struct dma_chan *chan)
{
- struct at_dma_chan *atchan = to_at_dma_chan(chan);
+ struct at_dma_chan *atchan = to_at_dma_chan(chan);
+ struct at_desc *desc;
+ unsigned long flags;
dev_vdbg(chan2dev(chan), "issue_pending\n");
- atc_advance_work(atchan);
+ spin_lock_irqsave(&atchan->lock, flags);
+ if (atc_chan_is_enabled(atchan) || list_empty(&atchan->queue))
+ return spin_unlock_irqrestore(&atchan->lock, flags);
+
+ desc = atc_first_queued(atchan);
+ list_move_tail(&desc->desc_node, &atchan->active_list);
+ atc_dostart(atchan, desc);
+ spin_unlock_irqrestore(&atchan->lock, flags);
}
/**
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
fcd37565efda ("dmaengine: at_hdmac: Fix premature completion of desc in issue_pending")
8a47221fc284 ("dmaengine: at_hdmac: Start transfer for cyclic channels in issue_pending")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fcd37565efdaffeac179d0f0ce980ac79bfdf569 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:38 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix premature completion of desc in
issue_pending
Multiple calls to atc_issue_pending() could result in a premature
completion of a descriptor from the atchan->active list, as the method
always completed the first active descriptor from the list. Instead,
issue_pending() should just take the first transaction descriptor from the
pending queue, move it to active_list and start the transfer.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-5-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index e9d0c3632868..cb5522417db6 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -1527,16 +1527,26 @@ atc_tx_status(struct dma_chan *chan,
}
/**
- * atc_issue_pending - try to finish work
+ * atc_issue_pending - takes the first transaction descriptor in the pending
+ * queue and starts the transfer.
* @chan: target DMA channel
*/
static void atc_issue_pending(struct dma_chan *chan)
{
- struct at_dma_chan *atchan = to_at_dma_chan(chan);
+ struct at_dma_chan *atchan = to_at_dma_chan(chan);
+ struct at_desc *desc;
+ unsigned long flags;
dev_vdbg(chan2dev(chan), "issue_pending\n");
- atc_advance_work(atchan);
+ spin_lock_irqsave(&atchan->lock, flags);
+ if (atc_chan_is_enabled(atchan) || list_empty(&atchan->queue))
+ return spin_unlock_irqrestore(&atchan->lock, flags);
+
+ desc = atc_first_queued(atchan);
+ list_move_tail(&desc->desc_node, &atchan->active_list);
+ atc_dostart(atchan, desc);
+ spin_unlock_irqrestore(&atchan->lock, flags);
}
/**
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
fcd37565efda ("dmaengine: at_hdmac: Fix premature completion of desc in issue_pending")
8a47221fc284 ("dmaengine: at_hdmac: Start transfer for cyclic channels in issue_pending")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fcd37565efdaffeac179d0f0ce980ac79bfdf569 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:38 +0300
Subject: [PATCH] dmaengine: at_hdmac: Fix premature completion of desc in
issue_pending
Multiple calls to atc_issue_pending() could result in a premature
completion of a descriptor from the atchan->active list, as the method
always completed the first active descriptor from the list. Instead,
issue_pending() should just take the first transaction descriptor from the
pending queue, move it to active_list and start the transfer.
Fixes: dc78baa2b90b ("dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-5-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index e9d0c3632868..cb5522417db6 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -1527,16 +1527,26 @@ atc_tx_status(struct dma_chan *chan,
}
/**
- * atc_issue_pending - try to finish work
+ * atc_issue_pending - takes the first transaction descriptor in the pending
+ * queue and starts the transfer.
* @chan: target DMA channel
*/
static void atc_issue_pending(struct dma_chan *chan)
{
- struct at_dma_chan *atchan = to_at_dma_chan(chan);
+ struct at_dma_chan *atchan = to_at_dma_chan(chan);
+ struct at_desc *desc;
+ unsigned long flags;
dev_vdbg(chan2dev(chan), "issue_pending\n");
- atc_advance_work(atchan);
+ spin_lock_irqsave(&atchan->lock, flags);
+ if (atc_chan_is_enabled(atchan) || list_empty(&atchan->queue))
+ return spin_unlock_irqrestore(&atchan->lock, flags);
+
+ desc = atc_first_queued(atchan);
+ list_move_tail(&desc->desc_node, &atchan->active_list);
+ atc_dostart(atchan, desc);
+ spin_unlock_irqrestore(&atchan->lock, flags);
}
/**
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8a47221fc284 ("dmaengine: at_hdmac: Start transfer for cyclic channels in issue_pending")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a47221fc28417ff8a32a4f92d4448a56c3cf7e1 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:37 +0300
Subject: [PATCH] dmaengine: at_hdmac: Start transfer for cyclic channels in
issue_pending
Cyclic channels must too call issue_pending in order to start a transfer.
Start the transfer in issue_pending regardless of the type of channel.
This wrongly worked before, because in the past the transfer was started
at tx_submit level when only a desc in the transfer list.
Fixes: 53830cc75974 ("dmaengine: at_hdmac: add cyclic DMA operation support")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-4-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 3f71f4d2f467..e9d0c3632868 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -1536,10 +1536,6 @@ static void atc_issue_pending(struct dma_chan *chan)
dev_vdbg(chan2dev(chan), "issue_pending\n");
- /* Not needed for cyclic transfers */
- if (atc_chan_is_cyclic(atchan))
- return;
-
atc_advance_work(atchan);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8a47221fc284 ("dmaengine: at_hdmac: Start transfer for cyclic channels in issue_pending")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a47221fc28417ff8a32a4f92d4448a56c3cf7e1 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:37 +0300
Subject: [PATCH] dmaengine: at_hdmac: Start transfer for cyclic channels in
issue_pending
Cyclic channels must too call issue_pending in order to start a transfer.
Start the transfer in issue_pending regardless of the type of channel.
This wrongly worked before, because in the past the transfer was started
at tx_submit level when only a desc in the transfer list.
Fixes: 53830cc75974 ("dmaengine: at_hdmac: add cyclic DMA operation support")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-4-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 3f71f4d2f467..e9d0c3632868 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -1536,10 +1536,6 @@ static void atc_issue_pending(struct dma_chan *chan)
dev_vdbg(chan2dev(chan), "issue_pending\n");
- /* Not needed for cyclic transfers */
- if (atc_chan_is_cyclic(atchan))
- return;
-
atc_advance_work(atchan);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8a47221fc284 ("dmaengine: at_hdmac: Start transfer for cyclic channels in issue_pending")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a47221fc28417ff8a32a4f92d4448a56c3cf7e1 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:37 +0300
Subject: [PATCH] dmaengine: at_hdmac: Start transfer for cyclic channels in
issue_pending
Cyclic channels must too call issue_pending in order to start a transfer.
Start the transfer in issue_pending regardless of the type of channel.
This wrongly worked before, because in the past the transfer was started
at tx_submit level when only a desc in the transfer list.
Fixes: 53830cc75974 ("dmaengine: at_hdmac: add cyclic DMA operation support")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-4-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 3f71f4d2f467..e9d0c3632868 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -1536,10 +1536,6 @@ static void atc_issue_pending(struct dma_chan *chan)
dev_vdbg(chan2dev(chan), "issue_pending\n");
- /* Not needed for cyclic transfers */
- if (atc_chan_is_cyclic(atchan))
- return;
-
atc_advance_work(atchan);
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8a47221fc284 ("dmaengine: at_hdmac: Start transfer for cyclic channels in issue_pending")
078a6506141a ("dmaengine: at_hdmac: Fix deadlocks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a47221fc28417ff8a32a4f92d4448a56c3cf7e1 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Tue, 25 Oct 2022 12:02:37 +0300
Subject: [PATCH] dmaengine: at_hdmac: Start transfer for cyclic channels in
issue_pending
Cyclic channels must too call issue_pending in order to start a transfer.
Start the transfer in issue_pending regardless of the type of channel.
This wrongly worked before, because in the past the transfer was started
at tx_submit level when only a desc in the transfer list.
Fixes: 53830cc75974 ("dmaengine: at_hdmac: add cyclic DMA operation support")
Reported-by: Peter Rosin <peda(a)axentia.se>
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/13c6c9a2-6db5-c3bf-349b-4c127ad3496a@axentia.s…
Acked-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20221025090306.297886-1-tudor.ambarus@microchip.c…
Link: https://lore.kernel.org/r/20221025090306.297886-4-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
index 3f71f4d2f467..e9d0c3632868 100644
--- a/drivers/dma/at_hdmac.c
+++ b/drivers/dma/at_hdmac.c
@@ -1536,10 +1536,6 @@ static void atc_issue_pending(struct dma_chan *chan)
dev_vdbg(chan2dev(chan), "issue_pending\n");
- /* Not needed for cyclic transfers */
- if (atc_chan_is_cyclic(atchan))
- return;
-
atc_advance_work(atchan);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
e61ab42de874 ("KVM: SVM: move guest vmsave/vmload back to assembly")
f6d58266d731 ("KVM: SVM: retrieve VMCB from assembly")
f7ef280132f9 ("KVM: SVM: adjust register allocation for __svm_vcpu_run()")
16fdc1de169e ("KVM: SVM: replace regs argument of __svm_vcpu_run() with vcpu_svm")
debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file")
bb06650634d3 ("KVM: VMX: Convert launched argument to flags")
8bd200d23ec4 ("KVM: VMX: Flatten __vmx_vcpu_run()")
527a534c7326 ("x86/tdx: Provide common base for SEAMCALL and TDCALL C wrappers")
59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in early boot")
6198311093da ("x86/cc: Move arch/x86/{kernel/cc_platform.c => coco/core.c}")
f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation")
22da5a07c75e ("x86/lib/atomic64_386_32: Rename things")
1367afaa2ee9 ("x86/entry: Use the correct fence macro after swapgs in kernel CR3")
c07e45553da1 ("x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()")
6e5772c8d9cf ("Merge tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e61ab42de874c5af8c5d98b327c77a374d9e7da1 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Mon, 7 Nov 2022 05:14:27 -0500
Subject: [PATCH] KVM: SVM: move guest vmsave/vmload back to assembly
It is error-prone that code after vmexit cannot access percpu data
because GSBASE has not been restored yet. It forces MSR_IA32_SPEC_CTRL
save/restore to happen very late, after the predictor untraining
sequence, and it gets in the way of return stack depth tracking
(a retbleed mitigation that is in linux-next as of 2022-11-09).
As a first step towards fixing that, move the VMCB VMSAVE/VMLOAD to
assembly, essentially undoing commit fb0c4a4fee5a ("KVM: SVM: move
VMLOAD/VMSAVE to C code", 2021-03-15). The reason for that commit was
that it made it simpler to use a different VMCB for VMLOAD/VMSAVE versus
VMRUN; but that is not a big hassle anymore thanks to the kvm-asm-offsets
machinery and other related cleanups.
The idea on how to number the exception tables is stolen from
a prototype patch by Peter Zijlstra.
Cc: stable(a)vger.kernel.org
Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk")
Link: <https://lore.kernel.org/all/f571e404-e625-bae1-10e9-449b2eb4cbd8@citrix.com/>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/kvm-asm-offsets.c b/arch/x86/kvm/kvm-asm-offsets.c
index f1b694e431ae..f83e88b85bf2 100644
--- a/arch/x86/kvm/kvm-asm-offsets.c
+++ b/arch/x86/kvm/kvm-asm-offsets.c
@@ -16,6 +16,7 @@ static void __used common(void)
BLANK();
OFFSET(SVM_vcpu_arch_regs, vcpu_svm, vcpu.arch.regs);
OFFSET(SVM_current_vmcb, vcpu_svm, current_vmcb);
+ OFFSET(SVM_vmcb01, vcpu_svm, vmcb01);
OFFSET(KVM_VMCB_pa, kvm_vmcb_info, pa);
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 48274c93d78b..4e3a47eb5002 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3910,16 +3910,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
} else {
struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, vcpu->cpu);
- /*
- * Use a single vmcb (vmcb01 because it's always valid) for
- * context switching guest state via VMLOAD/VMSAVE, that way
- * the state doesn't need to be copied between vmcb01 and
- * vmcb02 when switching vmcbs for nested virtualization.
- */
- vmload(svm->vmcb01.pa);
__svm_vcpu_run(svm);
- vmsave(svm->vmcb01.pa);
-
vmload(__sme_page_pa(sd->save_area));
}
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index d07bac1952c5..5bc2ed7d79c0 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -28,6 +28,8 @@
#define VCPU_R15 (SVM_vcpu_arch_regs + __VCPU_REGS_R15 * WORD_SIZE)
#endif
+#define SVM_vmcb01_pa (SVM_vmcb01 + KVM_VMCB_pa)
+
.section .noinstr.text, "ax"
/**
@@ -55,6 +57,16 @@ SYM_FUNC_START(__svm_vcpu_run)
mov %_ASM_ARG1, %_ASM_DI
.endif
+ /*
+ * Use a single vmcb (vmcb01 because it's always valid) for
+ * context switching guest state via VMLOAD/VMSAVE, that way
+ * the state doesn't need to be copied between vmcb01 and
+ * vmcb02 when switching vmcbs for nested virtualization.
+ */
+ mov SVM_vmcb01_pa(%_ASM_DI), %_ASM_AX
+1: vmload %_ASM_AX
+2:
+
/* Get svm->current_vmcb->pa into RAX. */
mov SVM_current_vmcb(%_ASM_DI), %_ASM_AX
mov KVM_VMCB_pa(%_ASM_AX), %_ASM_AX
@@ -80,16 +92,11 @@ SYM_FUNC_START(__svm_vcpu_run)
/* Enter guest mode */
sti
-1: vmrun %_ASM_AX
-
-2: cli
-
-#ifdef CONFIG_RETPOLINE
- /* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
- FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
-#endif
+3: vmrun %_ASM_AX
+4:
+ cli
- /* "POP" @svm to RAX. */
+ /* Pop @svm to RAX while it's the only available register. */
pop %_ASM_AX
/* Save all guest registers. */
@@ -110,6 +117,18 @@ SYM_FUNC_START(__svm_vcpu_run)
mov %r15, VCPU_R15(%_ASM_AX)
#endif
+ /* @svm can stay in RDI from now on. */
+ mov %_ASM_AX, %_ASM_DI
+
+ mov SVM_vmcb01_pa(%_ASM_DI), %_ASM_AX
+5: vmsave %_ASM_AX
+6:
+
+#ifdef CONFIG_RETPOLINE
+ /* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
+ FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
+#endif
+
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. RET should be
* untrained as soon as we exit the VM and are back to the
@@ -159,11 +178,19 @@ SYM_FUNC_START(__svm_vcpu_run)
pop %_ASM_BP
RET
-3: cmpb $0, kvm_rebooting
+10: cmpb $0, kvm_rebooting
jne 2b
ud2
+30: cmpb $0, kvm_rebooting
+ jne 4b
+ ud2
+50: cmpb $0, kvm_rebooting
+ jne 6b
+ ud2
- _ASM_EXTABLE(1b, 3b)
+ _ASM_EXTABLE(1b, 10b)
+ _ASM_EXTABLE(3b, 30b)
+ _ASM_EXTABLE(5b, 50b)
SYM_FUNC_END(__svm_vcpu_run)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
e61ab42de874 ("KVM: SVM: move guest vmsave/vmload back to assembly")
f6d58266d731 ("KVM: SVM: retrieve VMCB from assembly")
f7ef280132f9 ("KVM: SVM: adjust register allocation for __svm_vcpu_run()")
16fdc1de169e ("KVM: SVM: replace regs argument of __svm_vcpu_run() with vcpu_svm")
debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file")
bb06650634d3 ("KVM: VMX: Convert launched argument to flags")
8bd200d23ec4 ("KVM: VMX: Flatten __vmx_vcpu_run()")
527a534c7326 ("x86/tdx: Provide common base for SEAMCALL and TDCALL C wrappers")
59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in early boot")
6198311093da ("x86/cc: Move arch/x86/{kernel/cc_platform.c => coco/core.c}")
f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation")
22da5a07c75e ("x86/lib/atomic64_386_32: Rename things")
1367afaa2ee9 ("x86/entry: Use the correct fence macro after swapgs in kernel CR3")
c07e45553da1 ("x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()")
6e5772c8d9cf ("Merge tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e61ab42de874c5af8c5d98b327c77a374d9e7da1 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Mon, 7 Nov 2022 05:14:27 -0500
Subject: [PATCH] KVM: SVM: move guest vmsave/vmload back to assembly
It is error-prone that code after vmexit cannot access percpu data
because GSBASE has not been restored yet. It forces MSR_IA32_SPEC_CTRL
save/restore to happen very late, after the predictor untraining
sequence, and it gets in the way of return stack depth tracking
(a retbleed mitigation that is in linux-next as of 2022-11-09).
As a first step towards fixing that, move the VMCB VMSAVE/VMLOAD to
assembly, essentially undoing commit fb0c4a4fee5a ("KVM: SVM: move
VMLOAD/VMSAVE to C code", 2021-03-15). The reason for that commit was
that it made it simpler to use a different VMCB for VMLOAD/VMSAVE versus
VMRUN; but that is not a big hassle anymore thanks to the kvm-asm-offsets
machinery and other related cleanups.
The idea on how to number the exception tables is stolen from
a prototype patch by Peter Zijlstra.
Cc: stable(a)vger.kernel.org
Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk")
Link: <https://lore.kernel.org/all/f571e404-e625-bae1-10e9-449b2eb4cbd8@citrix.com/>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/kvm-asm-offsets.c b/arch/x86/kvm/kvm-asm-offsets.c
index f1b694e431ae..f83e88b85bf2 100644
--- a/arch/x86/kvm/kvm-asm-offsets.c
+++ b/arch/x86/kvm/kvm-asm-offsets.c
@@ -16,6 +16,7 @@ static void __used common(void)
BLANK();
OFFSET(SVM_vcpu_arch_regs, vcpu_svm, vcpu.arch.regs);
OFFSET(SVM_current_vmcb, vcpu_svm, current_vmcb);
+ OFFSET(SVM_vmcb01, vcpu_svm, vmcb01);
OFFSET(KVM_VMCB_pa, kvm_vmcb_info, pa);
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 48274c93d78b..4e3a47eb5002 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3910,16 +3910,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
} else {
struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, vcpu->cpu);
- /*
- * Use a single vmcb (vmcb01 because it's always valid) for
- * context switching guest state via VMLOAD/VMSAVE, that way
- * the state doesn't need to be copied between vmcb01 and
- * vmcb02 when switching vmcbs for nested virtualization.
- */
- vmload(svm->vmcb01.pa);
__svm_vcpu_run(svm);
- vmsave(svm->vmcb01.pa);
-
vmload(__sme_page_pa(sd->save_area));
}
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index d07bac1952c5..5bc2ed7d79c0 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -28,6 +28,8 @@
#define VCPU_R15 (SVM_vcpu_arch_regs + __VCPU_REGS_R15 * WORD_SIZE)
#endif
+#define SVM_vmcb01_pa (SVM_vmcb01 + KVM_VMCB_pa)
+
.section .noinstr.text, "ax"
/**
@@ -55,6 +57,16 @@ SYM_FUNC_START(__svm_vcpu_run)
mov %_ASM_ARG1, %_ASM_DI
.endif
+ /*
+ * Use a single vmcb (vmcb01 because it's always valid) for
+ * context switching guest state via VMLOAD/VMSAVE, that way
+ * the state doesn't need to be copied between vmcb01 and
+ * vmcb02 when switching vmcbs for nested virtualization.
+ */
+ mov SVM_vmcb01_pa(%_ASM_DI), %_ASM_AX
+1: vmload %_ASM_AX
+2:
+
/* Get svm->current_vmcb->pa into RAX. */
mov SVM_current_vmcb(%_ASM_DI), %_ASM_AX
mov KVM_VMCB_pa(%_ASM_AX), %_ASM_AX
@@ -80,16 +92,11 @@ SYM_FUNC_START(__svm_vcpu_run)
/* Enter guest mode */
sti
-1: vmrun %_ASM_AX
-
-2: cli
-
-#ifdef CONFIG_RETPOLINE
- /* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
- FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
-#endif
+3: vmrun %_ASM_AX
+4:
+ cli
- /* "POP" @svm to RAX. */
+ /* Pop @svm to RAX while it's the only available register. */
pop %_ASM_AX
/* Save all guest registers. */
@@ -110,6 +117,18 @@ SYM_FUNC_START(__svm_vcpu_run)
mov %r15, VCPU_R15(%_ASM_AX)
#endif
+ /* @svm can stay in RDI from now on. */
+ mov %_ASM_AX, %_ASM_DI
+
+ mov SVM_vmcb01_pa(%_ASM_DI), %_ASM_AX
+5: vmsave %_ASM_AX
+6:
+
+#ifdef CONFIG_RETPOLINE
+ /* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
+ FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
+#endif
+
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. RET should be
* untrained as soon as we exit the VM and are back to the
@@ -159,11 +178,19 @@ SYM_FUNC_START(__svm_vcpu_run)
pop %_ASM_BP
RET
-3: cmpb $0, kvm_rebooting
+10: cmpb $0, kvm_rebooting
jne 2b
ud2
+30: cmpb $0, kvm_rebooting
+ jne 4b
+ ud2
+50: cmpb $0, kvm_rebooting
+ jne 6b
+ ud2
- _ASM_EXTABLE(1b, 3b)
+ _ASM_EXTABLE(1b, 10b)
+ _ASM_EXTABLE(3b, 30b)
+ _ASM_EXTABLE(5b, 50b)
SYM_FUNC_END(__svm_vcpu_run)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f6d58266d731 ("KVM: SVM: retrieve VMCB from assembly")
f7ef280132f9 ("KVM: SVM: adjust register allocation for __svm_vcpu_run()")
16fdc1de169e ("KVM: SVM: replace regs argument of __svm_vcpu_run() with vcpu_svm")
debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file")
bb06650634d3 ("KVM: VMX: Convert launched argument to flags")
8bd200d23ec4 ("KVM: VMX: Flatten __vmx_vcpu_run()")
527a534c7326 ("x86/tdx: Provide common base for SEAMCALL and TDCALL C wrappers")
59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in early boot")
6198311093da ("x86/cc: Move arch/x86/{kernel/cc_platform.c => coco/core.c}")
f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation")
22da5a07c75e ("x86/lib/atomic64_386_32: Rename things")
1367afaa2ee9 ("x86/entry: Use the correct fence macro after swapgs in kernel CR3")
c07e45553da1 ("x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()")
6e5772c8d9cf ("Merge tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f6d58266d731fd7e63163790aad21e0dbb1d5264 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Mon, 7 Nov 2022 04:17:29 -0500
Subject: [PATCH] KVM: SVM: retrieve VMCB from assembly
Continue moving accesses to struct vcpu_svm to vmenter.S. Reducing the
number of arguments limits the chance of mistakes due to different
registers used for argument passing in 32- and 64-bit ABIs; pushing the
VMCB argument and almost immediately popping it into a different
register looks pretty weird.
32-bit ABI is not a concern for __svm_sev_es_vcpu_run() which is 64-bit
only; however, it will soon need @svm to save/restore SPEC_CTRL so stay
consistent with __svm_vcpu_run() and let them share the same prototype.
No functional change intended.
Cc: stable(a)vger.kernel.org
Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk")
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/kvm-asm-offsets.c b/arch/x86/kvm/kvm-asm-offsets.c
index 30db96852e2d..f1b694e431ae 100644
--- a/arch/x86/kvm/kvm-asm-offsets.c
+++ b/arch/x86/kvm/kvm-asm-offsets.c
@@ -15,6 +15,8 @@ static void __used common(void)
if (IS_ENABLED(CONFIG_KVM_AMD)) {
BLANK();
OFFSET(SVM_vcpu_arch_regs, vcpu_svm, vcpu.arch.regs);
+ OFFSET(SVM_current_vmcb, vcpu_svm, current_vmcb);
+ OFFSET(KVM_VMCB_pa, kvm_vmcb_info, pa);
}
if (IS_ENABLED(CONFIG_KVM_INTEL)) {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b412bc5773c5..0c86c435c51f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3914,12 +3914,11 @@ static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- unsigned long vmcb_pa = svm->current_vmcb->pa;
guest_state_enter_irqoff();
if (sev_es_guest(vcpu->kvm)) {
- __svm_sev_es_vcpu_run(vmcb_pa);
+ __svm_sev_es_vcpu_run(svm);
} else {
struct svm_cpu_data *sd = per_cpu(svm_data, vcpu->cpu);
@@ -3930,7 +3929,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
* vmcb02 when switching vmcbs for nested virtualization.
*/
vmload(svm->vmcb01.pa);
- __svm_vcpu_run(vmcb_pa, svm);
+ __svm_vcpu_run(svm);
vmsave(svm->vmcb01.pa);
vmload(__sme_page_pa(sd->save_area));
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 447e25c9101a..7ff1879e73c5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -683,7 +683,7 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm);
/* vmenter.S */
-void __svm_sev_es_vcpu_run(unsigned long vmcb_pa);
-void __svm_vcpu_run(unsigned long vmcb_pa, struct vcpu_svm *svm);
+void __svm_sev_es_vcpu_run(struct vcpu_svm *svm);
+void __svm_vcpu_run(struct vcpu_svm *svm);
#endif
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 531510ab6072..d07bac1952c5 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -32,7 +32,6 @@
/**
* __svm_vcpu_run - Run a vCPU via a transition to SVM guest mode
- * @vmcb_pa: unsigned long
* @svm: struct vcpu_svm *
*/
SYM_FUNC_START(__svm_vcpu_run)
@@ -49,16 +48,16 @@ SYM_FUNC_START(__svm_vcpu_run)
push %_ASM_BX
/* Save @svm. */
- push %_ASM_ARG2
-
- /* Save @vmcb. */
push %_ASM_ARG1
+.ifnc _ASM_ARG1, _ASM_DI
/* Move @svm to RDI. */
- mov %_ASM_ARG2, %_ASM_DI
+ mov %_ASM_ARG1, %_ASM_DI
+.endif
- /* "POP" @vmcb to RAX. */
- pop %_ASM_AX
+ /* Get svm->current_vmcb->pa into RAX. */
+ mov SVM_current_vmcb(%_ASM_DI), %_ASM_AX
+ mov KVM_VMCB_pa(%_ASM_AX), %_ASM_AX
/* Load guest registers. */
mov VCPU_RCX(%_ASM_DI), %_ASM_CX
@@ -170,7 +169,7 @@ SYM_FUNC_END(__svm_vcpu_run)
/**
* __svm_sev_es_vcpu_run - Run a SEV-ES vCPU via a transition to SVM guest mode
- * @vmcb_pa: unsigned long
+ * @svm: struct vcpu_svm *
*/
SYM_FUNC_START(__svm_sev_es_vcpu_run)
push %_ASM_BP
@@ -185,8 +184,9 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
#endif
push %_ASM_BX
- /* Move @vmcb to RAX. */
- mov %_ASM_ARG1, %_ASM_AX
+ /* Get svm->current_vmcb->pa into RAX. */
+ mov SVM_current_vmcb(%_ASM_ARG1), %_ASM_AX
+ mov KVM_VMCB_pa(%_ASM_AX), %_ASM_AX
/* Enter guest mode */
sti
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f6d58266d731 ("KVM: SVM: retrieve VMCB from assembly")
f7ef280132f9 ("KVM: SVM: adjust register allocation for __svm_vcpu_run()")
16fdc1de169e ("KVM: SVM: replace regs argument of __svm_vcpu_run() with vcpu_svm")
debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file")
bb06650634d3 ("KVM: VMX: Convert launched argument to flags")
8bd200d23ec4 ("KVM: VMX: Flatten __vmx_vcpu_run()")
527a534c7326 ("x86/tdx: Provide common base for SEAMCALL and TDCALL C wrappers")
59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in early boot")
6198311093da ("x86/cc: Move arch/x86/{kernel/cc_platform.c => coco/core.c}")
f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation")
22da5a07c75e ("x86/lib/atomic64_386_32: Rename things")
1367afaa2ee9 ("x86/entry: Use the correct fence macro after swapgs in kernel CR3")
c07e45553da1 ("x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()")
6e5772c8d9cf ("Merge tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f6d58266d731fd7e63163790aad21e0dbb1d5264 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Mon, 7 Nov 2022 04:17:29 -0500
Subject: [PATCH] KVM: SVM: retrieve VMCB from assembly
Continue moving accesses to struct vcpu_svm to vmenter.S. Reducing the
number of arguments limits the chance of mistakes due to different
registers used for argument passing in 32- and 64-bit ABIs; pushing the
VMCB argument and almost immediately popping it into a different
register looks pretty weird.
32-bit ABI is not a concern for __svm_sev_es_vcpu_run() which is 64-bit
only; however, it will soon need @svm to save/restore SPEC_CTRL so stay
consistent with __svm_vcpu_run() and let them share the same prototype.
No functional change intended.
Cc: stable(a)vger.kernel.org
Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk")
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/kvm-asm-offsets.c b/arch/x86/kvm/kvm-asm-offsets.c
index 30db96852e2d..f1b694e431ae 100644
--- a/arch/x86/kvm/kvm-asm-offsets.c
+++ b/arch/x86/kvm/kvm-asm-offsets.c
@@ -15,6 +15,8 @@ static void __used common(void)
if (IS_ENABLED(CONFIG_KVM_AMD)) {
BLANK();
OFFSET(SVM_vcpu_arch_regs, vcpu_svm, vcpu.arch.regs);
+ OFFSET(SVM_current_vmcb, vcpu_svm, current_vmcb);
+ OFFSET(KVM_VMCB_pa, kvm_vmcb_info, pa);
}
if (IS_ENABLED(CONFIG_KVM_INTEL)) {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b412bc5773c5..0c86c435c51f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3914,12 +3914,11 @@ static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- unsigned long vmcb_pa = svm->current_vmcb->pa;
guest_state_enter_irqoff();
if (sev_es_guest(vcpu->kvm)) {
- __svm_sev_es_vcpu_run(vmcb_pa);
+ __svm_sev_es_vcpu_run(svm);
} else {
struct svm_cpu_data *sd = per_cpu(svm_data, vcpu->cpu);
@@ -3930,7 +3929,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
* vmcb02 when switching vmcbs for nested virtualization.
*/
vmload(svm->vmcb01.pa);
- __svm_vcpu_run(vmcb_pa, svm);
+ __svm_vcpu_run(svm);
vmsave(svm->vmcb01.pa);
vmload(__sme_page_pa(sd->save_area));
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 447e25c9101a..7ff1879e73c5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -683,7 +683,7 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm);
/* vmenter.S */
-void __svm_sev_es_vcpu_run(unsigned long vmcb_pa);
-void __svm_vcpu_run(unsigned long vmcb_pa, struct vcpu_svm *svm);
+void __svm_sev_es_vcpu_run(struct vcpu_svm *svm);
+void __svm_vcpu_run(struct vcpu_svm *svm);
#endif
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 531510ab6072..d07bac1952c5 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -32,7 +32,6 @@
/**
* __svm_vcpu_run - Run a vCPU via a transition to SVM guest mode
- * @vmcb_pa: unsigned long
* @svm: struct vcpu_svm *
*/
SYM_FUNC_START(__svm_vcpu_run)
@@ -49,16 +48,16 @@ SYM_FUNC_START(__svm_vcpu_run)
push %_ASM_BX
/* Save @svm. */
- push %_ASM_ARG2
-
- /* Save @vmcb. */
push %_ASM_ARG1
+.ifnc _ASM_ARG1, _ASM_DI
/* Move @svm to RDI. */
- mov %_ASM_ARG2, %_ASM_DI
+ mov %_ASM_ARG1, %_ASM_DI
+.endif
- /* "POP" @vmcb to RAX. */
- pop %_ASM_AX
+ /* Get svm->current_vmcb->pa into RAX. */
+ mov SVM_current_vmcb(%_ASM_DI), %_ASM_AX
+ mov KVM_VMCB_pa(%_ASM_AX), %_ASM_AX
/* Load guest registers. */
mov VCPU_RCX(%_ASM_DI), %_ASM_CX
@@ -170,7 +169,7 @@ SYM_FUNC_END(__svm_vcpu_run)
/**
* __svm_sev_es_vcpu_run - Run a SEV-ES vCPU via a transition to SVM guest mode
- * @vmcb_pa: unsigned long
+ * @svm: struct vcpu_svm *
*/
SYM_FUNC_START(__svm_sev_es_vcpu_run)
push %_ASM_BP
@@ -185,8 +184,9 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
#endif
push %_ASM_BX
- /* Move @vmcb to RAX. */
- mov %_ASM_ARG1, %_ASM_AX
+ /* Get svm->current_vmcb->pa into RAX. */
+ mov SVM_current_vmcb(%_ASM_ARG1), %_ASM_AX
+ mov KVM_VMCB_pa(%_ASM_AX), %_ASM_AX
/* Enter guest mode */
sti
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f7ef280132f9 ("KVM: SVM: adjust register allocation for __svm_vcpu_run()")
16fdc1de169e ("KVM: SVM: replace regs argument of __svm_vcpu_run() with vcpu_svm")
debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file")
bb06650634d3 ("KVM: VMX: Convert launched argument to flags")
8bd200d23ec4 ("KVM: VMX: Flatten __vmx_vcpu_run()")
527a534c7326 ("x86/tdx: Provide common base for SEAMCALL and TDCALL C wrappers")
59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in early boot")
6198311093da ("x86/cc: Move arch/x86/{kernel/cc_platform.c => coco/core.c}")
f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation")
22da5a07c75e ("x86/lib/atomic64_386_32: Rename things")
1367afaa2ee9 ("x86/entry: Use the correct fence macro after swapgs in kernel CR3")
c07e45553da1 ("x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()")
6e5772c8d9cf ("Merge tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f7ef280132f9bf6f82acf5aa5c3c837206eef501 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Fri, 28 Oct 2022 17:30:07 -0400
Subject: [PATCH] KVM: SVM: adjust register allocation for __svm_vcpu_run()
32-bit ABI uses RAX/RCX/RDX as its argument registers, so they are in
the way of instructions that hardcode their operands such as RDMSR/WRMSR
or VMLOAD/VMRUN/VMSAVE.
In preparation for moving vmload/vmsave to __svm_vcpu_run(), keep
the pointer to the struct vcpu_svm in %rdi. In particular, it is now
possible to load svm->vmcb01.pa in %rax without clobbering the struct
vcpu_svm pointer.
No functional change intended.
Cc: stable(a)vger.kernel.org
Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk")
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index f0ff41103e4c..531510ab6072 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -54,29 +54,29 @@ SYM_FUNC_START(__svm_vcpu_run)
/* Save @vmcb. */
push %_ASM_ARG1
- /* Move @svm to RAX. */
- mov %_ASM_ARG2, %_ASM_AX
+ /* Move @svm to RDI. */
+ mov %_ASM_ARG2, %_ASM_DI
+
+ /* "POP" @vmcb to RAX. */
+ pop %_ASM_AX
/* Load guest registers. */
- mov VCPU_RCX(%_ASM_AX), %_ASM_CX
- mov VCPU_RDX(%_ASM_AX), %_ASM_DX
- mov VCPU_RBX(%_ASM_AX), %_ASM_BX
- mov VCPU_RBP(%_ASM_AX), %_ASM_BP
- mov VCPU_RSI(%_ASM_AX), %_ASM_SI
- mov VCPU_RDI(%_ASM_AX), %_ASM_DI
+ mov VCPU_RCX(%_ASM_DI), %_ASM_CX
+ mov VCPU_RDX(%_ASM_DI), %_ASM_DX
+ mov VCPU_RBX(%_ASM_DI), %_ASM_BX
+ mov VCPU_RBP(%_ASM_DI), %_ASM_BP
+ mov VCPU_RSI(%_ASM_DI), %_ASM_SI
#ifdef CONFIG_X86_64
- mov VCPU_R8 (%_ASM_AX), %r8
- mov VCPU_R9 (%_ASM_AX), %r9
- mov VCPU_R10(%_ASM_AX), %r10
- mov VCPU_R11(%_ASM_AX), %r11
- mov VCPU_R12(%_ASM_AX), %r12
- mov VCPU_R13(%_ASM_AX), %r13
- mov VCPU_R14(%_ASM_AX), %r14
- mov VCPU_R15(%_ASM_AX), %r15
+ mov VCPU_R8 (%_ASM_DI), %r8
+ mov VCPU_R9 (%_ASM_DI), %r9
+ mov VCPU_R10(%_ASM_DI), %r10
+ mov VCPU_R11(%_ASM_DI), %r11
+ mov VCPU_R12(%_ASM_DI), %r12
+ mov VCPU_R13(%_ASM_DI), %r13
+ mov VCPU_R14(%_ASM_DI), %r14
+ mov VCPU_R15(%_ASM_DI), %r15
#endif
-
- /* "POP" @vmcb to RAX. */
- pop %_ASM_AX
+ mov VCPU_RDI(%_ASM_DI), %_ASM_DI
/* Enter guest mode */
sti
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f7ef280132f9 ("KVM: SVM: adjust register allocation for __svm_vcpu_run()")
16fdc1de169e ("KVM: SVM: replace regs argument of __svm_vcpu_run() with vcpu_svm")
debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file")
bb06650634d3 ("KVM: VMX: Convert launched argument to flags")
8bd200d23ec4 ("KVM: VMX: Flatten __vmx_vcpu_run()")
527a534c7326 ("x86/tdx: Provide common base for SEAMCALL and TDCALL C wrappers")
59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in early boot")
6198311093da ("x86/cc: Move arch/x86/{kernel/cc_platform.c => coco/core.c}")
f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation")
22da5a07c75e ("x86/lib/atomic64_386_32: Rename things")
1367afaa2ee9 ("x86/entry: Use the correct fence macro after swapgs in kernel CR3")
c07e45553da1 ("x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()")
6e5772c8d9cf ("Merge tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f7ef280132f9bf6f82acf5aa5c3c837206eef501 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Fri, 28 Oct 2022 17:30:07 -0400
Subject: [PATCH] KVM: SVM: adjust register allocation for __svm_vcpu_run()
32-bit ABI uses RAX/RCX/RDX as its argument registers, so they are in
the way of instructions that hardcode their operands such as RDMSR/WRMSR
or VMLOAD/VMRUN/VMSAVE.
In preparation for moving vmload/vmsave to __svm_vcpu_run(), keep
the pointer to the struct vcpu_svm in %rdi. In particular, it is now
possible to load svm->vmcb01.pa in %rax without clobbering the struct
vcpu_svm pointer.
No functional change intended.
Cc: stable(a)vger.kernel.org
Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk")
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index f0ff41103e4c..531510ab6072 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -54,29 +54,29 @@ SYM_FUNC_START(__svm_vcpu_run)
/* Save @vmcb. */
push %_ASM_ARG1
- /* Move @svm to RAX. */
- mov %_ASM_ARG2, %_ASM_AX
+ /* Move @svm to RDI. */
+ mov %_ASM_ARG2, %_ASM_DI
+
+ /* "POP" @vmcb to RAX. */
+ pop %_ASM_AX
/* Load guest registers. */
- mov VCPU_RCX(%_ASM_AX), %_ASM_CX
- mov VCPU_RDX(%_ASM_AX), %_ASM_DX
- mov VCPU_RBX(%_ASM_AX), %_ASM_BX
- mov VCPU_RBP(%_ASM_AX), %_ASM_BP
- mov VCPU_RSI(%_ASM_AX), %_ASM_SI
- mov VCPU_RDI(%_ASM_AX), %_ASM_DI
+ mov VCPU_RCX(%_ASM_DI), %_ASM_CX
+ mov VCPU_RDX(%_ASM_DI), %_ASM_DX
+ mov VCPU_RBX(%_ASM_DI), %_ASM_BX
+ mov VCPU_RBP(%_ASM_DI), %_ASM_BP
+ mov VCPU_RSI(%_ASM_DI), %_ASM_SI
#ifdef CONFIG_X86_64
- mov VCPU_R8 (%_ASM_AX), %r8
- mov VCPU_R9 (%_ASM_AX), %r9
- mov VCPU_R10(%_ASM_AX), %r10
- mov VCPU_R11(%_ASM_AX), %r11
- mov VCPU_R12(%_ASM_AX), %r12
- mov VCPU_R13(%_ASM_AX), %r13
- mov VCPU_R14(%_ASM_AX), %r14
- mov VCPU_R15(%_ASM_AX), %r15
+ mov VCPU_R8 (%_ASM_DI), %r8
+ mov VCPU_R9 (%_ASM_DI), %r9
+ mov VCPU_R10(%_ASM_DI), %r10
+ mov VCPU_R11(%_ASM_DI), %r11
+ mov VCPU_R12(%_ASM_DI), %r12
+ mov VCPU_R13(%_ASM_DI), %r13
+ mov VCPU_R14(%_ASM_DI), %r14
+ mov VCPU_R15(%_ASM_DI), %r15
#endif
-
- /* "POP" @vmcb to RAX. */
- pop %_ASM_AX
+ mov VCPU_RDI(%_ASM_DI), %_ASM_DI
/* Enter guest mode */
sti
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
9f2febf3f04d ("KVM: SVM: move MSR_IA32_SPEC_CTRL save/restore to assembly")
e287bd005ad9 ("KVM: SVM: restore host save area from assembly")
e61ab42de874 ("KVM: SVM: move guest vmsave/vmload back to assembly")
73412dfeea72 ("KVM: SVM: do not allocate struct svm_cpu_data dynamically")
181d0fb0bb02 ("KVM: SVM: remove dead field from struct svm_cpu_data")
f6d58266d731 ("KVM: SVM: retrieve VMCB from assembly")
f7ef280132f9 ("KVM: SVM: adjust register allocation for __svm_vcpu_run()")
16fdc1de169e ("KVM: SVM: replace regs argument of __svm_vcpu_run() with vcpu_svm")
debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file")
07853adc29a0 ("KVM: VMX: Prevent RSB underflow before vmenter")
fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS")
bb06650634d3 ("KVM: VMX: Convert launched argument to flags")
8bd200d23ec4 ("KVM: VMX: Flatten __vmx_vcpu_run()")
acac5e98ef8d ("x86/speculation: Remove x86_spec_ctrl_mask")
bbb69e8bee1b ("x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit")
7c693f54c873 ("x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS")
c779bc1a9002 ("x86/bugs: Optimize SPEC_CTRL MSR writes")
caa0ff24d5d0 ("x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value")
a149180fbcf3 ("x86: Add magic AMD return-thunk")
d9e9d2300681 ("x86,objtool: Create .return_sites")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9f2febf3f04daebdaaa5a43cfa20e3844905c0f9 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Fri, 30 Sep 2022 14:24:40 -0400
Subject: [PATCH] KVM: SVM: move MSR_IA32_SPEC_CTRL save/restore to assembly
Restoration of the host IA32_SPEC_CTRL value is probably too late
with respect to the return thunk training sequence.
With respect to the user/kernel boundary, AMD says, "If software chooses
to toggle STIBP (e.g., set STIBP on kernel entry, and clear it on kernel
exit), software should set STIBP to 1 before executing the return thunk
training sequence." I assume the same requirements apply to the guest/host
boundary. The return thunk training sequence is in vmenter.S, quite close
to the VM-exit. On hosts without V_SPEC_CTRL, however, the host's
IA32_SPEC_CTRL value is not restored until much later.
To avoid this, move the restoration of host SPEC_CTRL to assembly and,
for consistency, move the restoration of the guest SPEC_CTRL as well.
This is not particularly difficult, apart from some care to cover both
32- and 64-bit, and to share code between SEV-ES and normal vmentry.
Cc: stable(a)vger.kernel.org
Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk")
Suggested-by: Jim Mattson <jmattson(a)google.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index da7c361f47e0..6ec0b7ce7453 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -196,22 +196,15 @@ void __init check_bugs(void)
}
/*
- * NOTE: This function is *only* called for SVM. VMX spec_ctrl handling is
- * done in vmenter.S.
+ * NOTE: This function is *only* called for SVM, since Intel uses
+ * MSR_IA32_SPEC_CTRL for SSBD.
*/
void
x86_virt_spec_ctrl(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl, bool setguest)
{
- u64 msrval, guestval = guest_spec_ctrl, hostval = spec_ctrl_current();
+ u64 guestval, hostval;
struct thread_info *ti = current_thread_info();
- if (static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) {
- if (hostval != guestval) {
- msrval = setguest ? guestval : hostval;
- wrmsrl(MSR_IA32_SPEC_CTRL, msrval);
- }
- }
-
/*
* If SSBD is not handled in MSR_SPEC_CTRL on AMD, update
* MSR_AMD64_L2_CFG or MSR_VIRT_SPEC_CTRL if supported.
diff --git a/arch/x86/kvm/kvm-asm-offsets.c b/arch/x86/kvm/kvm-asm-offsets.c
index 1b805cd24d66..24a710d37323 100644
--- a/arch/x86/kvm/kvm-asm-offsets.c
+++ b/arch/x86/kvm/kvm-asm-offsets.c
@@ -16,6 +16,7 @@ static void __used common(void)
BLANK();
OFFSET(SVM_vcpu_arch_regs, vcpu_svm, vcpu.arch.regs);
OFFSET(SVM_current_vmcb, vcpu_svm, current_vmcb);
+ OFFSET(SVM_spec_ctrl, vcpu_svm, spec_ctrl);
OFFSET(SVM_vmcb01, vcpu_svm, vmcb01);
OFFSET(KVM_VMCB_pa, kvm_vmcb_info, pa);
OFFSET(SD_save_area_pa, svm_cpu_data, save_area_pa);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 469c1b5617af..cf1aed25f4ab 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -720,6 +720,15 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
u32 offset;
u32 *msrpm;
+ /*
+ * For non-nested case:
+ * If the L01 MSR bitmap does not intercept the MSR, then we need to
+ * save it.
+ *
+ * For nested case:
+ * If the L02 MSR bitmap does not intercept the MSR, then we need to
+ * save it.
+ */
msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm:
to_svm(vcpu)->msrpm;
@@ -3901,16 +3910,16 @@ static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
return EXIT_FASTPATH_NONE;
}
-static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
+static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_intercepted)
{
struct vcpu_svm *svm = to_svm(vcpu);
guest_state_enter_irqoff();
if (sev_es_guest(vcpu->kvm))
- __svm_sev_es_vcpu_run(svm);
+ __svm_sev_es_vcpu_run(svm, spec_ctrl_intercepted);
else
- __svm_vcpu_run(svm);
+ __svm_vcpu_run(svm, spec_ctrl_intercepted);
guest_state_exit_irqoff();
}
@@ -3918,6 +3927,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ bool spec_ctrl_intercepted = msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL);
trace_kvm_entry(vcpu);
@@ -3976,26 +3986,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
- svm_vcpu_enter_exit(vcpu);
-
- /*
- * We do not use IBRS in the kernel. If this vCPU has used the
- * SPEC_CTRL MSR it may have left it on; save the value and
- * turn it off. This is much more efficient than blindly adding
- * it to the atomic save/restore list. Especially as the former
- * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
- *
- * For non-nested case:
- * If the L01 MSR bitmap does not intercept the MSR, then we need to
- * save it.
- *
- * For nested case:
- * If the L02 MSR bitmap does not intercept the MSR, then we need to
- * save it.
- */
- if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL) &&
- unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
- svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
+ svm_vcpu_enter_exit(vcpu, spec_ctrl_intercepted);
if (!sev_es_guest(vcpu->kvm))
reload_tss(vcpu);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 83955a4e520e..199a2ecef1ce 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -682,7 +682,7 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm);
/* vmenter.S */
-void __svm_sev_es_vcpu_run(struct vcpu_svm *svm);
-void __svm_vcpu_run(struct vcpu_svm *svm);
+void __svm_sev_es_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted);
+void __svm_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted);
#endif
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 57440acfc73e..34367dc203f2 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -32,9 +32,69 @@
.section .noinstr.text, "ax"
+.macro RESTORE_GUEST_SPEC_CTRL
+ /* No need to do anything if SPEC_CTRL is unset or V_SPEC_CTRL is set */
+ ALTERNATIVE_2 "", \
+ "jmp 800f", X86_FEATURE_MSR_SPEC_CTRL, \
+ "", X86_FEATURE_V_SPEC_CTRL
+801:
+.endm
+.macro RESTORE_GUEST_SPEC_CTRL_BODY
+800:
+ /*
+ * SPEC_CTRL handling: if the guest's SPEC_CTRL value differs from the
+ * host's, write the MSR. This is kept out-of-line so that the common
+ * case does not have to jump.
+ *
+ * IMPORTANT: To avoid RSB underflow attacks and any other nastiness,
+ * there must not be any returns or indirect branches between this code
+ * and vmentry.
+ */
+ movl SVM_spec_ctrl(%_ASM_DI), %eax
+ cmp PER_CPU_VAR(x86_spec_ctrl_current), %eax
+ je 801b
+ mov $MSR_IA32_SPEC_CTRL, %ecx
+ xor %edx, %edx
+ wrmsr
+ jmp 801b
+.endm
+
+.macro RESTORE_HOST_SPEC_CTRL
+ /* No need to do anything if SPEC_CTRL is unset or V_SPEC_CTRL is set */
+ ALTERNATIVE_2 "", \
+ "jmp 900f", X86_FEATURE_MSR_SPEC_CTRL, \
+ "", X86_FEATURE_V_SPEC_CTRL
+901:
+.endm
+.macro RESTORE_HOST_SPEC_CTRL_BODY
+900:
+ /* Same for after vmexit. */
+ mov $MSR_IA32_SPEC_CTRL, %ecx
+
+ /*
+ * Load the value that the guest had written into MSR_IA32_SPEC_CTRL,
+ * if it was not intercepted during guest execution.
+ */
+ cmpb $0, (%_ASM_SP)
+ jnz 998f
+ rdmsr
+ movl %eax, SVM_spec_ctrl(%_ASM_DI)
+998:
+
+ /* Now restore the host value of the MSR if different from the guest's. */
+ movl PER_CPU_VAR(x86_spec_ctrl_current), %eax
+ cmp SVM_spec_ctrl(%_ASM_DI), %eax
+ je 901b
+ xor %edx, %edx
+ wrmsr
+ jmp 901b
+.endm
+
+
/**
* __svm_vcpu_run - Run a vCPU via a transition to SVM guest mode
* @svm: struct vcpu_svm *
+ * @spec_ctrl_intercepted: bool
*/
SYM_FUNC_START(__svm_vcpu_run)
push %_ASM_BP
@@ -54,17 +114,26 @@ SYM_FUNC_START(__svm_vcpu_run)
* order compared to when they are needed.
*/
+ /* Accessed directly from the stack in RESTORE_HOST_SPEC_CTRL. */
+ push %_ASM_ARG2
+
/* Needed to restore access to percpu variables. */
__ASM_SIZE(push) PER_CPU_VAR(svm_data + SD_save_area_pa)
- /* Save @svm. */
+ /* Finally save @svm. */
push %_ASM_ARG1
.ifnc _ASM_ARG1, _ASM_DI
- /* Move @svm to RDI. */
+ /*
+ * Stash @svm in RDI early. On 32-bit, arguments are in RAX, RCX
+ * and RDX which are clobbered by RESTORE_GUEST_SPEC_CTRL.
+ */
mov %_ASM_ARG1, %_ASM_DI
.endif
+ /* Clobbers RAX, RCX, RDX. */
+ RESTORE_GUEST_SPEC_CTRL
+
/*
* Use a single vmcb (vmcb01 because it's always valid) for
* context switching guest state via VMLOAD/VMSAVE, that way
@@ -142,6 +211,9 @@ SYM_FUNC_START(__svm_vcpu_run)
FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
#endif
+ /* Clobbers RAX, RCX, RDX. */
+ RESTORE_HOST_SPEC_CTRL
+
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. RET should be
* untrained as soon as we exit the VM and are back to the
@@ -177,6 +249,9 @@ SYM_FUNC_START(__svm_vcpu_run)
xor %r15d, %r15d
#endif
+ /* "Pop" @spec_ctrl_intercepted. */
+ pop %_ASM_BX
+
pop %_ASM_BX
#ifdef CONFIG_X86_64
@@ -191,6 +266,9 @@ SYM_FUNC_START(__svm_vcpu_run)
pop %_ASM_BP
RET
+ RESTORE_GUEST_SPEC_CTRL_BODY
+ RESTORE_HOST_SPEC_CTRL_BODY
+
10: cmpb $0, kvm_rebooting
jne 2b
ud2
@@ -214,6 +292,7 @@ SYM_FUNC_END(__svm_vcpu_run)
/**
* __svm_sev_es_vcpu_run - Run a SEV-ES vCPU via a transition to SVM guest mode
* @svm: struct vcpu_svm *
+ * @spec_ctrl_intercepted: bool
*/
SYM_FUNC_START(__svm_sev_es_vcpu_run)
push %_ASM_BP
@@ -228,8 +307,30 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
#endif
push %_ASM_BX
+ /*
+ * Save variables needed after vmexit on the stack, in inverse
+ * order compared to when they are needed.
+ */
+
+ /* Accessed directly from the stack in RESTORE_HOST_SPEC_CTRL. */
+ push %_ASM_ARG2
+
+ /* Save @svm. */
+ push %_ASM_ARG1
+
+.ifnc _ASM_ARG1, _ASM_DI
+ /*
+ * Stash @svm in RDI early. On 32-bit, arguments are in RAX, RCX
+ * and RDX which are clobbered by RESTORE_GUEST_SPEC_CTRL.
+ */
+ mov %_ASM_ARG1, %_ASM_DI
+.endif
+
+ /* Clobbers RAX, RCX, RDX. */
+ RESTORE_GUEST_SPEC_CTRL
+
/* Get svm->current_vmcb->pa into RAX. */
- mov SVM_current_vmcb(%_ASM_ARG1), %_ASM_AX
+ mov SVM_current_vmcb(%_ASM_DI), %_ASM_AX
mov KVM_VMCB_pa(%_ASM_AX), %_ASM_AX
/* Enter guest mode */
@@ -239,11 +340,17 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
2: cli
+ /* Pop @svm to RDI, guest registers have been saved already. */
+ pop %_ASM_DI
+
#ifdef CONFIG_RETPOLINE
/* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
#endif
+ /* Clobbers RAX, RCX, RDX. */
+ RESTORE_HOST_SPEC_CTRL
+
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. RET should be
* untrained as soon as we exit the VM and are back to the
@@ -253,6 +360,9 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
*/
UNTRAIN_RET
+ /* "Pop" @spec_ctrl_intercepted. */
+ pop %_ASM_BX
+
pop %_ASM_BX
#ifdef CONFIG_X86_64
@@ -267,6 +377,9 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
pop %_ASM_BP
RET
+ RESTORE_GUEST_SPEC_CTRL_BODY
+ RESTORE_HOST_SPEC_CTRL_BODY
+
3: cmpb $0, kvm_rebooting
jne 2b
ud2
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
9f2febf3f04d ("KVM: SVM: move MSR_IA32_SPEC_CTRL save/restore to assembly")
e287bd005ad9 ("KVM: SVM: restore host save area from assembly")
e61ab42de874 ("KVM: SVM: move guest vmsave/vmload back to assembly")
73412dfeea72 ("KVM: SVM: do not allocate struct svm_cpu_data dynamically")
181d0fb0bb02 ("KVM: SVM: remove dead field from struct svm_cpu_data")
f6d58266d731 ("KVM: SVM: retrieve VMCB from assembly")
f7ef280132f9 ("KVM: SVM: adjust register allocation for __svm_vcpu_run()")
16fdc1de169e ("KVM: SVM: replace regs argument of __svm_vcpu_run() with vcpu_svm")
debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file")
07853adc29a0 ("KVM: VMX: Prevent RSB underflow before vmenter")
fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS")
bb06650634d3 ("KVM: VMX: Convert launched argument to flags")
8bd200d23ec4 ("KVM: VMX: Flatten __vmx_vcpu_run()")
acac5e98ef8d ("x86/speculation: Remove x86_spec_ctrl_mask")
bbb69e8bee1b ("x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit")
7c693f54c873 ("x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS")
c779bc1a9002 ("x86/bugs: Optimize SPEC_CTRL MSR writes")
caa0ff24d5d0 ("x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value")
a149180fbcf3 ("x86: Add magic AMD return-thunk")
d9e9d2300681 ("x86,objtool: Create .return_sites")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9f2febf3f04daebdaaa5a43cfa20e3844905c0f9 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Fri, 30 Sep 2022 14:24:40 -0400
Subject: [PATCH] KVM: SVM: move MSR_IA32_SPEC_CTRL save/restore to assembly
Restoration of the host IA32_SPEC_CTRL value is probably too late
with respect to the return thunk training sequence.
With respect to the user/kernel boundary, AMD says, "If software chooses
to toggle STIBP (e.g., set STIBP on kernel entry, and clear it on kernel
exit), software should set STIBP to 1 before executing the return thunk
training sequence." I assume the same requirements apply to the guest/host
boundary. The return thunk training sequence is in vmenter.S, quite close
to the VM-exit. On hosts without V_SPEC_CTRL, however, the host's
IA32_SPEC_CTRL value is not restored until much later.
To avoid this, move the restoration of host SPEC_CTRL to assembly and,
for consistency, move the restoration of the guest SPEC_CTRL as well.
This is not particularly difficult, apart from some care to cover both
32- and 64-bit, and to share code between SEV-ES and normal vmentry.
Cc: stable(a)vger.kernel.org
Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk")
Suggested-by: Jim Mattson <jmattson(a)google.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index da7c361f47e0..6ec0b7ce7453 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -196,22 +196,15 @@ void __init check_bugs(void)
}
/*
- * NOTE: This function is *only* called for SVM. VMX spec_ctrl handling is
- * done in vmenter.S.
+ * NOTE: This function is *only* called for SVM, since Intel uses
+ * MSR_IA32_SPEC_CTRL for SSBD.
*/
void
x86_virt_spec_ctrl(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl, bool setguest)
{
- u64 msrval, guestval = guest_spec_ctrl, hostval = spec_ctrl_current();
+ u64 guestval, hostval;
struct thread_info *ti = current_thread_info();
- if (static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) {
- if (hostval != guestval) {
- msrval = setguest ? guestval : hostval;
- wrmsrl(MSR_IA32_SPEC_CTRL, msrval);
- }
- }
-
/*
* If SSBD is not handled in MSR_SPEC_CTRL on AMD, update
* MSR_AMD64_L2_CFG or MSR_VIRT_SPEC_CTRL if supported.
diff --git a/arch/x86/kvm/kvm-asm-offsets.c b/arch/x86/kvm/kvm-asm-offsets.c
index 1b805cd24d66..24a710d37323 100644
--- a/arch/x86/kvm/kvm-asm-offsets.c
+++ b/arch/x86/kvm/kvm-asm-offsets.c
@@ -16,6 +16,7 @@ static void __used common(void)
BLANK();
OFFSET(SVM_vcpu_arch_regs, vcpu_svm, vcpu.arch.regs);
OFFSET(SVM_current_vmcb, vcpu_svm, current_vmcb);
+ OFFSET(SVM_spec_ctrl, vcpu_svm, spec_ctrl);
OFFSET(SVM_vmcb01, vcpu_svm, vmcb01);
OFFSET(KVM_VMCB_pa, kvm_vmcb_info, pa);
OFFSET(SD_save_area_pa, svm_cpu_data, save_area_pa);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 469c1b5617af..cf1aed25f4ab 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -720,6 +720,15 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
u32 offset;
u32 *msrpm;
+ /*
+ * For non-nested case:
+ * If the L01 MSR bitmap does not intercept the MSR, then we need to
+ * save it.
+ *
+ * For nested case:
+ * If the L02 MSR bitmap does not intercept the MSR, then we need to
+ * save it.
+ */
msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm:
to_svm(vcpu)->msrpm;
@@ -3901,16 +3910,16 @@ static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
return EXIT_FASTPATH_NONE;
}
-static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
+static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_intercepted)
{
struct vcpu_svm *svm = to_svm(vcpu);
guest_state_enter_irqoff();
if (sev_es_guest(vcpu->kvm))
- __svm_sev_es_vcpu_run(svm);
+ __svm_sev_es_vcpu_run(svm, spec_ctrl_intercepted);
else
- __svm_vcpu_run(svm);
+ __svm_vcpu_run(svm, spec_ctrl_intercepted);
guest_state_exit_irqoff();
}
@@ -3918,6 +3927,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ bool spec_ctrl_intercepted = msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL);
trace_kvm_entry(vcpu);
@@ -3976,26 +3986,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
- svm_vcpu_enter_exit(vcpu);
-
- /*
- * We do not use IBRS in the kernel. If this vCPU has used the
- * SPEC_CTRL MSR it may have left it on; save the value and
- * turn it off. This is much more efficient than blindly adding
- * it to the atomic save/restore list. Especially as the former
- * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
- *
- * For non-nested case:
- * If the L01 MSR bitmap does not intercept the MSR, then we need to
- * save it.
- *
- * For nested case:
- * If the L02 MSR bitmap does not intercept the MSR, then we need to
- * save it.
- */
- if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL) &&
- unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
- svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
+ svm_vcpu_enter_exit(vcpu, spec_ctrl_intercepted);
if (!sev_es_guest(vcpu->kvm))
reload_tss(vcpu);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 83955a4e520e..199a2ecef1ce 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -682,7 +682,7 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm);
/* vmenter.S */
-void __svm_sev_es_vcpu_run(struct vcpu_svm *svm);
-void __svm_vcpu_run(struct vcpu_svm *svm);
+void __svm_sev_es_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted);
+void __svm_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted);
#endif
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 57440acfc73e..34367dc203f2 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -32,9 +32,69 @@
.section .noinstr.text, "ax"
+.macro RESTORE_GUEST_SPEC_CTRL
+ /* No need to do anything if SPEC_CTRL is unset or V_SPEC_CTRL is set */
+ ALTERNATIVE_2 "", \
+ "jmp 800f", X86_FEATURE_MSR_SPEC_CTRL, \
+ "", X86_FEATURE_V_SPEC_CTRL
+801:
+.endm
+.macro RESTORE_GUEST_SPEC_CTRL_BODY
+800:
+ /*
+ * SPEC_CTRL handling: if the guest's SPEC_CTRL value differs from the
+ * host's, write the MSR. This is kept out-of-line so that the common
+ * case does not have to jump.
+ *
+ * IMPORTANT: To avoid RSB underflow attacks and any other nastiness,
+ * there must not be any returns or indirect branches between this code
+ * and vmentry.
+ */
+ movl SVM_spec_ctrl(%_ASM_DI), %eax
+ cmp PER_CPU_VAR(x86_spec_ctrl_current), %eax
+ je 801b
+ mov $MSR_IA32_SPEC_CTRL, %ecx
+ xor %edx, %edx
+ wrmsr
+ jmp 801b
+.endm
+
+.macro RESTORE_HOST_SPEC_CTRL
+ /* No need to do anything if SPEC_CTRL is unset or V_SPEC_CTRL is set */
+ ALTERNATIVE_2 "", \
+ "jmp 900f", X86_FEATURE_MSR_SPEC_CTRL, \
+ "", X86_FEATURE_V_SPEC_CTRL
+901:
+.endm
+.macro RESTORE_HOST_SPEC_CTRL_BODY
+900:
+ /* Same for after vmexit. */
+ mov $MSR_IA32_SPEC_CTRL, %ecx
+
+ /*
+ * Load the value that the guest had written into MSR_IA32_SPEC_CTRL,
+ * if it was not intercepted during guest execution.
+ */
+ cmpb $0, (%_ASM_SP)
+ jnz 998f
+ rdmsr
+ movl %eax, SVM_spec_ctrl(%_ASM_DI)
+998:
+
+ /* Now restore the host value of the MSR if different from the guest's. */
+ movl PER_CPU_VAR(x86_spec_ctrl_current), %eax
+ cmp SVM_spec_ctrl(%_ASM_DI), %eax
+ je 901b
+ xor %edx, %edx
+ wrmsr
+ jmp 901b
+.endm
+
+
/**
* __svm_vcpu_run - Run a vCPU via a transition to SVM guest mode
* @svm: struct vcpu_svm *
+ * @spec_ctrl_intercepted: bool
*/
SYM_FUNC_START(__svm_vcpu_run)
push %_ASM_BP
@@ -54,17 +114,26 @@ SYM_FUNC_START(__svm_vcpu_run)
* order compared to when they are needed.
*/
+ /* Accessed directly from the stack in RESTORE_HOST_SPEC_CTRL. */
+ push %_ASM_ARG2
+
/* Needed to restore access to percpu variables. */
__ASM_SIZE(push) PER_CPU_VAR(svm_data + SD_save_area_pa)
- /* Save @svm. */
+ /* Finally save @svm. */
push %_ASM_ARG1
.ifnc _ASM_ARG1, _ASM_DI
- /* Move @svm to RDI. */
+ /*
+ * Stash @svm in RDI early. On 32-bit, arguments are in RAX, RCX
+ * and RDX which are clobbered by RESTORE_GUEST_SPEC_CTRL.
+ */
mov %_ASM_ARG1, %_ASM_DI
.endif
+ /* Clobbers RAX, RCX, RDX. */
+ RESTORE_GUEST_SPEC_CTRL
+
/*
* Use a single vmcb (vmcb01 because it's always valid) for
* context switching guest state via VMLOAD/VMSAVE, that way
@@ -142,6 +211,9 @@ SYM_FUNC_START(__svm_vcpu_run)
FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
#endif
+ /* Clobbers RAX, RCX, RDX. */
+ RESTORE_HOST_SPEC_CTRL
+
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. RET should be
* untrained as soon as we exit the VM and are back to the
@@ -177,6 +249,9 @@ SYM_FUNC_START(__svm_vcpu_run)
xor %r15d, %r15d
#endif
+ /* "Pop" @spec_ctrl_intercepted. */
+ pop %_ASM_BX
+
pop %_ASM_BX
#ifdef CONFIG_X86_64
@@ -191,6 +266,9 @@ SYM_FUNC_START(__svm_vcpu_run)
pop %_ASM_BP
RET
+ RESTORE_GUEST_SPEC_CTRL_BODY
+ RESTORE_HOST_SPEC_CTRL_BODY
+
10: cmpb $0, kvm_rebooting
jne 2b
ud2
@@ -214,6 +292,7 @@ SYM_FUNC_END(__svm_vcpu_run)
/**
* __svm_sev_es_vcpu_run - Run a SEV-ES vCPU via a transition to SVM guest mode
* @svm: struct vcpu_svm *
+ * @spec_ctrl_intercepted: bool
*/
SYM_FUNC_START(__svm_sev_es_vcpu_run)
push %_ASM_BP
@@ -228,8 +307,30 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
#endif
push %_ASM_BX
+ /*
+ * Save variables needed after vmexit on the stack, in inverse
+ * order compared to when they are needed.
+ */
+
+ /* Accessed directly from the stack in RESTORE_HOST_SPEC_CTRL. */
+ push %_ASM_ARG2
+
+ /* Save @svm. */
+ push %_ASM_ARG1
+
+.ifnc _ASM_ARG1, _ASM_DI
+ /*
+ * Stash @svm in RDI early. On 32-bit, arguments are in RAX, RCX
+ * and RDX which are clobbered by RESTORE_GUEST_SPEC_CTRL.
+ */
+ mov %_ASM_ARG1, %_ASM_DI
+.endif
+
+ /* Clobbers RAX, RCX, RDX. */
+ RESTORE_GUEST_SPEC_CTRL
+
/* Get svm->current_vmcb->pa into RAX. */
- mov SVM_current_vmcb(%_ASM_ARG1), %_ASM_AX
+ mov SVM_current_vmcb(%_ASM_DI), %_ASM_AX
mov KVM_VMCB_pa(%_ASM_AX), %_ASM_AX
/* Enter guest mode */
@@ -239,11 +340,17 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
2: cli
+ /* Pop @svm to RDI, guest registers have been saved already. */
+ pop %_ASM_DI
+
#ifdef CONFIG_RETPOLINE
/* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
#endif
+ /* Clobbers RAX, RCX, RDX. */
+ RESTORE_HOST_SPEC_CTRL
+
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. RET should be
* untrained as soon as we exit the VM and are back to the
@@ -253,6 +360,9 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
*/
UNTRAIN_RET
+ /* "Pop" @spec_ctrl_intercepted. */
+ pop %_ASM_BX
+
pop %_ASM_BX
#ifdef CONFIG_X86_64
@@ -267,6 +377,9 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
pop %_ASM_BP
RET
+ RESTORE_GUEST_SPEC_CTRL_BODY
+ RESTORE_HOST_SPEC_CTRL_BODY
+
3: cmpb $0, kvm_rebooting
jne 2b
ud2
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
9f2febf3f04d ("KVM: SVM: move MSR_IA32_SPEC_CTRL save/restore to assembly")
e287bd005ad9 ("KVM: SVM: restore host save area from assembly")
e61ab42de874 ("KVM: SVM: move guest vmsave/vmload back to assembly")
73412dfeea72 ("KVM: SVM: do not allocate struct svm_cpu_data dynamically")
181d0fb0bb02 ("KVM: SVM: remove dead field from struct svm_cpu_data")
f6d58266d731 ("KVM: SVM: retrieve VMCB from assembly")
f7ef280132f9 ("KVM: SVM: adjust register allocation for __svm_vcpu_run()")
16fdc1de169e ("KVM: SVM: replace regs argument of __svm_vcpu_run() with vcpu_svm")
debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9f2febf3f04daebdaaa5a43cfa20e3844905c0f9 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Fri, 30 Sep 2022 14:24:40 -0400
Subject: [PATCH] KVM: SVM: move MSR_IA32_SPEC_CTRL save/restore to assembly
Restoration of the host IA32_SPEC_CTRL value is probably too late
with respect to the return thunk training sequence.
With respect to the user/kernel boundary, AMD says, "If software chooses
to toggle STIBP (e.g., set STIBP on kernel entry, and clear it on kernel
exit), software should set STIBP to 1 before executing the return thunk
training sequence." I assume the same requirements apply to the guest/host
boundary. The return thunk training sequence is in vmenter.S, quite close
to the VM-exit. On hosts without V_SPEC_CTRL, however, the host's
IA32_SPEC_CTRL value is not restored until much later.
To avoid this, move the restoration of host SPEC_CTRL to assembly and,
for consistency, move the restoration of the guest SPEC_CTRL as well.
This is not particularly difficult, apart from some care to cover both
32- and 64-bit, and to share code between SEV-ES and normal vmentry.
Cc: stable(a)vger.kernel.org
Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk")
Suggested-by: Jim Mattson <jmattson(a)google.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index da7c361f47e0..6ec0b7ce7453 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -196,22 +196,15 @@ void __init check_bugs(void)
}
/*
- * NOTE: This function is *only* called for SVM. VMX spec_ctrl handling is
- * done in vmenter.S.
+ * NOTE: This function is *only* called for SVM, since Intel uses
+ * MSR_IA32_SPEC_CTRL for SSBD.
*/
void
x86_virt_spec_ctrl(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl, bool setguest)
{
- u64 msrval, guestval = guest_spec_ctrl, hostval = spec_ctrl_current();
+ u64 guestval, hostval;
struct thread_info *ti = current_thread_info();
- if (static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) {
- if (hostval != guestval) {
- msrval = setguest ? guestval : hostval;
- wrmsrl(MSR_IA32_SPEC_CTRL, msrval);
- }
- }
-
/*
* If SSBD is not handled in MSR_SPEC_CTRL on AMD, update
* MSR_AMD64_L2_CFG or MSR_VIRT_SPEC_CTRL if supported.
diff --git a/arch/x86/kvm/kvm-asm-offsets.c b/arch/x86/kvm/kvm-asm-offsets.c
index 1b805cd24d66..24a710d37323 100644
--- a/arch/x86/kvm/kvm-asm-offsets.c
+++ b/arch/x86/kvm/kvm-asm-offsets.c
@@ -16,6 +16,7 @@ static void __used common(void)
BLANK();
OFFSET(SVM_vcpu_arch_regs, vcpu_svm, vcpu.arch.regs);
OFFSET(SVM_current_vmcb, vcpu_svm, current_vmcb);
+ OFFSET(SVM_spec_ctrl, vcpu_svm, spec_ctrl);
OFFSET(SVM_vmcb01, vcpu_svm, vmcb01);
OFFSET(KVM_VMCB_pa, kvm_vmcb_info, pa);
OFFSET(SD_save_area_pa, svm_cpu_data, save_area_pa);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 469c1b5617af..cf1aed25f4ab 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -720,6 +720,15 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
u32 offset;
u32 *msrpm;
+ /*
+ * For non-nested case:
+ * If the L01 MSR bitmap does not intercept the MSR, then we need to
+ * save it.
+ *
+ * For nested case:
+ * If the L02 MSR bitmap does not intercept the MSR, then we need to
+ * save it.
+ */
msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm:
to_svm(vcpu)->msrpm;
@@ -3901,16 +3910,16 @@ static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
return EXIT_FASTPATH_NONE;
}
-static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
+static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_intercepted)
{
struct vcpu_svm *svm = to_svm(vcpu);
guest_state_enter_irqoff();
if (sev_es_guest(vcpu->kvm))
- __svm_sev_es_vcpu_run(svm);
+ __svm_sev_es_vcpu_run(svm, spec_ctrl_intercepted);
else
- __svm_vcpu_run(svm);
+ __svm_vcpu_run(svm, spec_ctrl_intercepted);
guest_state_exit_irqoff();
}
@@ -3918,6 +3927,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ bool spec_ctrl_intercepted = msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL);
trace_kvm_entry(vcpu);
@@ -3976,26 +3986,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl);
- svm_vcpu_enter_exit(vcpu);
-
- /*
- * We do not use IBRS in the kernel. If this vCPU has used the
- * SPEC_CTRL MSR it may have left it on; save the value and
- * turn it off. This is much more efficient than blindly adding
- * it to the atomic save/restore list. Especially as the former
- * (Saving guest MSRs on vmexit) doesn't even exist in KVM.
- *
- * For non-nested case:
- * If the L01 MSR bitmap does not intercept the MSR, then we need to
- * save it.
- *
- * For nested case:
- * If the L02 MSR bitmap does not intercept the MSR, then we need to
- * save it.
- */
- if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL) &&
- unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
- svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
+ svm_vcpu_enter_exit(vcpu, spec_ctrl_intercepted);
if (!sev_es_guest(vcpu->kvm))
reload_tss(vcpu);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 83955a4e520e..199a2ecef1ce 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -682,7 +682,7 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm);
/* vmenter.S */
-void __svm_sev_es_vcpu_run(struct vcpu_svm *svm);
-void __svm_vcpu_run(struct vcpu_svm *svm);
+void __svm_sev_es_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted);
+void __svm_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted);
#endif
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 57440acfc73e..34367dc203f2 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -32,9 +32,69 @@
.section .noinstr.text, "ax"
+.macro RESTORE_GUEST_SPEC_CTRL
+ /* No need to do anything if SPEC_CTRL is unset or V_SPEC_CTRL is set */
+ ALTERNATIVE_2 "", \
+ "jmp 800f", X86_FEATURE_MSR_SPEC_CTRL, \
+ "", X86_FEATURE_V_SPEC_CTRL
+801:
+.endm
+.macro RESTORE_GUEST_SPEC_CTRL_BODY
+800:
+ /*
+ * SPEC_CTRL handling: if the guest's SPEC_CTRL value differs from the
+ * host's, write the MSR. This is kept out-of-line so that the common
+ * case does not have to jump.
+ *
+ * IMPORTANT: To avoid RSB underflow attacks and any other nastiness,
+ * there must not be any returns or indirect branches between this code
+ * and vmentry.
+ */
+ movl SVM_spec_ctrl(%_ASM_DI), %eax
+ cmp PER_CPU_VAR(x86_spec_ctrl_current), %eax
+ je 801b
+ mov $MSR_IA32_SPEC_CTRL, %ecx
+ xor %edx, %edx
+ wrmsr
+ jmp 801b
+.endm
+
+.macro RESTORE_HOST_SPEC_CTRL
+ /* No need to do anything if SPEC_CTRL is unset or V_SPEC_CTRL is set */
+ ALTERNATIVE_2 "", \
+ "jmp 900f", X86_FEATURE_MSR_SPEC_CTRL, \
+ "", X86_FEATURE_V_SPEC_CTRL
+901:
+.endm
+.macro RESTORE_HOST_SPEC_CTRL_BODY
+900:
+ /* Same for after vmexit. */
+ mov $MSR_IA32_SPEC_CTRL, %ecx
+
+ /*
+ * Load the value that the guest had written into MSR_IA32_SPEC_CTRL,
+ * if it was not intercepted during guest execution.
+ */
+ cmpb $0, (%_ASM_SP)
+ jnz 998f
+ rdmsr
+ movl %eax, SVM_spec_ctrl(%_ASM_DI)
+998:
+
+ /* Now restore the host value of the MSR if different from the guest's. */
+ movl PER_CPU_VAR(x86_spec_ctrl_current), %eax
+ cmp SVM_spec_ctrl(%_ASM_DI), %eax
+ je 901b
+ xor %edx, %edx
+ wrmsr
+ jmp 901b
+.endm
+
+
/**
* __svm_vcpu_run - Run a vCPU via a transition to SVM guest mode
* @svm: struct vcpu_svm *
+ * @spec_ctrl_intercepted: bool
*/
SYM_FUNC_START(__svm_vcpu_run)
push %_ASM_BP
@@ -54,17 +114,26 @@ SYM_FUNC_START(__svm_vcpu_run)
* order compared to when they are needed.
*/
+ /* Accessed directly from the stack in RESTORE_HOST_SPEC_CTRL. */
+ push %_ASM_ARG2
+
/* Needed to restore access to percpu variables. */
__ASM_SIZE(push) PER_CPU_VAR(svm_data + SD_save_area_pa)
- /* Save @svm. */
+ /* Finally save @svm. */
push %_ASM_ARG1
.ifnc _ASM_ARG1, _ASM_DI
- /* Move @svm to RDI. */
+ /*
+ * Stash @svm in RDI early. On 32-bit, arguments are in RAX, RCX
+ * and RDX which are clobbered by RESTORE_GUEST_SPEC_CTRL.
+ */
mov %_ASM_ARG1, %_ASM_DI
.endif
+ /* Clobbers RAX, RCX, RDX. */
+ RESTORE_GUEST_SPEC_CTRL
+
/*
* Use a single vmcb (vmcb01 because it's always valid) for
* context switching guest state via VMLOAD/VMSAVE, that way
@@ -142,6 +211,9 @@ SYM_FUNC_START(__svm_vcpu_run)
FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
#endif
+ /* Clobbers RAX, RCX, RDX. */
+ RESTORE_HOST_SPEC_CTRL
+
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. RET should be
* untrained as soon as we exit the VM and are back to the
@@ -177,6 +249,9 @@ SYM_FUNC_START(__svm_vcpu_run)
xor %r15d, %r15d
#endif
+ /* "Pop" @spec_ctrl_intercepted. */
+ pop %_ASM_BX
+
pop %_ASM_BX
#ifdef CONFIG_X86_64
@@ -191,6 +266,9 @@ SYM_FUNC_START(__svm_vcpu_run)
pop %_ASM_BP
RET
+ RESTORE_GUEST_SPEC_CTRL_BODY
+ RESTORE_HOST_SPEC_CTRL_BODY
+
10: cmpb $0, kvm_rebooting
jne 2b
ud2
@@ -214,6 +292,7 @@ SYM_FUNC_END(__svm_vcpu_run)
/**
* __svm_sev_es_vcpu_run - Run a SEV-ES vCPU via a transition to SVM guest mode
* @svm: struct vcpu_svm *
+ * @spec_ctrl_intercepted: bool
*/
SYM_FUNC_START(__svm_sev_es_vcpu_run)
push %_ASM_BP
@@ -228,8 +307,30 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
#endif
push %_ASM_BX
+ /*
+ * Save variables needed after vmexit on the stack, in inverse
+ * order compared to when they are needed.
+ */
+
+ /* Accessed directly from the stack in RESTORE_HOST_SPEC_CTRL. */
+ push %_ASM_ARG2
+
+ /* Save @svm. */
+ push %_ASM_ARG1
+
+.ifnc _ASM_ARG1, _ASM_DI
+ /*
+ * Stash @svm in RDI early. On 32-bit, arguments are in RAX, RCX
+ * and RDX which are clobbered by RESTORE_GUEST_SPEC_CTRL.
+ */
+ mov %_ASM_ARG1, %_ASM_DI
+.endif
+
+ /* Clobbers RAX, RCX, RDX. */
+ RESTORE_GUEST_SPEC_CTRL
+
/* Get svm->current_vmcb->pa into RAX. */
- mov SVM_current_vmcb(%_ASM_ARG1), %_ASM_AX
+ mov SVM_current_vmcb(%_ASM_DI), %_ASM_AX
mov KVM_VMCB_pa(%_ASM_AX), %_ASM_AX
/* Enter guest mode */
@@ -239,11 +340,17 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
2: cli
+ /* Pop @svm to RDI, guest registers have been saved already. */
+ pop %_ASM_DI
+
#ifdef CONFIG_RETPOLINE
/* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
#endif
+ /* Clobbers RAX, RCX, RDX. */
+ RESTORE_HOST_SPEC_CTRL
+
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. RET should be
* untrained as soon as we exit the VM and are back to the
@@ -253,6 +360,9 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
*/
UNTRAIN_RET
+ /* "Pop" @spec_ctrl_intercepted. */
+ pop %_ASM_BX
+
pop %_ASM_BX
#ifdef CONFIG_X86_64
@@ -267,6 +377,9 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
pop %_ASM_BP
RET
+ RESTORE_GUEST_SPEC_CTRL_BODY
+ RESTORE_HOST_SPEC_CTRL_BODY
+
3: cmpb $0, kvm_rebooting
jne 2b
ud2
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file")
bb06650634d3 ("KVM: VMX: Convert launched argument to flags")
8bd200d23ec4 ("KVM: VMX: Flatten __vmx_vcpu_run()")
527a534c7326 ("x86/tdx: Provide common base for SEAMCALL and TDCALL C wrappers")
59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in early boot")
6198311093da ("x86/cc: Move arch/x86/{kernel/cc_platform.c => coco/core.c}")
f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation")
22da5a07c75e ("x86/lib/atomic64_386_32: Rename things")
1367afaa2ee9 ("x86/entry: Use the correct fence macro after swapgs in kernel CR3")
c07e45553da1 ("x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()")
6e5772c8d9cf ("Merge tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From debc5a1ec0d195ffea70d11efeffb713de9fdbc7 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Tue, 8 Nov 2022 09:44:53 +0100
Subject: [PATCH] KVM: x86: use a separate asm-offsets.c file
This already removes an ugly #include "" from asm-offsets.c, but
especially it avoids a future error when trying to define asm-offsets
for KVM's svm/svm.h header.
This would not work for kernel/asm-offsets.c, because svm/svm.h
includes kvm_cache_regs.h which is not in the include path when
compiling asm-offsets.c. The problem is not there if the .c file is
in arch/x86/kvm.
Suggested-by: Sean Christopherson <seanjc(a)google.com>
Cc: stable(a)vger.kernel.org
Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk")
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index cb50589a7102..437308004ef2 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -19,7 +19,6 @@
#include <asm/suspend.h>
#include <asm/tlbflush.h>
#include <asm/tdx.h>
-#include "../kvm/vmx/vmx.h"
#ifdef CONFIG_XEN
#include <xen/interface/xen.h>
@@ -108,9 +107,4 @@ static void __used common(void)
OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
OFFSET(TSS_sp2, tss_struct, x86_tss.sp2);
-
- if (IS_ENABLED(CONFIG_KVM_INTEL)) {
- BLANK();
- OFFSET(VMX_spec_ctrl, vcpu_vmx, spec_ctrl);
- }
}
diff --git a/arch/x86/kvm/.gitignore b/arch/x86/kvm/.gitignore
new file mode 100644
index 000000000000..615d6ff35c00
--- /dev/null
+++ b/arch/x86/kvm/.gitignore
@@ -0,0 +1,2 @@
+/kvm-asm-offsets.s
+/kvm-asm-offsets.h
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 30f244b64523..a02cf9baacc8 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -34,3 +34,12 @@ endif
obj-$(CONFIG_KVM) += kvm.o
obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
obj-$(CONFIG_KVM_AMD) += kvm-amd.o
+
+AFLAGS_vmx/vmenter.o := -iquote $(obj)
+$(obj)/vmx/vmenter.o: $(obj)/kvm-asm-offsets.h
+
+$(obj)/kvm-asm-offsets.h: $(obj)/kvm-asm-offsets.s FORCE
+ $(call filechk,offsets,__KVM_ASM_OFFSETS_H__)
+
+targets += kvm-asm-offsets.s
+clean-files += kvm-asm-offsets.h
diff --git a/arch/x86/kvm/kvm-asm-offsets.c b/arch/x86/kvm/kvm-asm-offsets.c
new file mode 100644
index 000000000000..9d84f2b32d7f
--- /dev/null
+++ b/arch/x86/kvm/kvm-asm-offsets.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Generate definitions needed by assembly language modules.
+ * This code generates raw asm output which is post-processed to extract
+ * and format the required data.
+ */
+#define COMPILE_OFFSETS
+
+#include <linux/kbuild.h>
+#include "vmx/vmx.h"
+
+static void __used common(void)
+{
+ if (IS_ENABLED(CONFIG_KVM_INTEL)) {
+ BLANK();
+ OFFSET(VMX_spec_ctrl, vcpu_vmx, spec_ctrl);
+ }
+}
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 8477d8bdd69c..0b5db4de4d09 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -1,12 +1,12 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <linux/linkage.h>
#include <asm/asm.h>
-#include <asm/asm-offsets.h>
#include <asm/bitsperlong.h>
#include <asm/kvm_vcpu_regs.h>
#include <asm/nospec-branch.h>
#include <asm/percpu.h>
#include <asm/segment.h>
+#include "kvm-asm-offsets.h"
#include "run_flags.h"
#define WORD_SIZE (BITS_PER_LONG / 8)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file")
bb06650634d3 ("KVM: VMX: Convert launched argument to flags")
8bd200d23ec4 ("KVM: VMX: Flatten __vmx_vcpu_run()")
527a534c7326 ("x86/tdx: Provide common base for SEAMCALL and TDCALL C wrappers")
59bd54a84d15 ("x86/tdx: Detect running as a TDX guest in early boot")
6198311093da ("x86/cc: Move arch/x86/{kernel/cc_platform.c => coco/core.c}")
f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation")
22da5a07c75e ("x86/lib/atomic64_386_32: Rename things")
1367afaa2ee9 ("x86/entry: Use the correct fence macro after swapgs in kernel CR3")
c07e45553da1 ("x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()")
6e5772c8d9cf ("Merge tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From debc5a1ec0d195ffea70d11efeffb713de9fdbc7 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Tue, 8 Nov 2022 09:44:53 +0100
Subject: [PATCH] KVM: x86: use a separate asm-offsets.c file
This already removes an ugly #include "" from asm-offsets.c, but
especially it avoids a future error when trying to define asm-offsets
for KVM's svm/svm.h header.
This would not work for kernel/asm-offsets.c, because svm/svm.h
includes kvm_cache_regs.h which is not in the include path when
compiling asm-offsets.c. The problem is not there if the .c file is
in arch/x86/kvm.
Suggested-by: Sean Christopherson <seanjc(a)google.com>
Cc: stable(a)vger.kernel.org
Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk")
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index cb50589a7102..437308004ef2 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -19,7 +19,6 @@
#include <asm/suspend.h>
#include <asm/tlbflush.h>
#include <asm/tdx.h>
-#include "../kvm/vmx/vmx.h"
#ifdef CONFIG_XEN
#include <xen/interface/xen.h>
@@ -108,9 +107,4 @@ static void __used common(void)
OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
OFFSET(TSS_sp2, tss_struct, x86_tss.sp2);
-
- if (IS_ENABLED(CONFIG_KVM_INTEL)) {
- BLANK();
- OFFSET(VMX_spec_ctrl, vcpu_vmx, spec_ctrl);
- }
}
diff --git a/arch/x86/kvm/.gitignore b/arch/x86/kvm/.gitignore
new file mode 100644
index 000000000000..615d6ff35c00
--- /dev/null
+++ b/arch/x86/kvm/.gitignore
@@ -0,0 +1,2 @@
+/kvm-asm-offsets.s
+/kvm-asm-offsets.h
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 30f244b64523..a02cf9baacc8 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -34,3 +34,12 @@ endif
obj-$(CONFIG_KVM) += kvm.o
obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
obj-$(CONFIG_KVM_AMD) += kvm-amd.o
+
+AFLAGS_vmx/vmenter.o := -iquote $(obj)
+$(obj)/vmx/vmenter.o: $(obj)/kvm-asm-offsets.h
+
+$(obj)/kvm-asm-offsets.h: $(obj)/kvm-asm-offsets.s FORCE
+ $(call filechk,offsets,__KVM_ASM_OFFSETS_H__)
+
+targets += kvm-asm-offsets.s
+clean-files += kvm-asm-offsets.h
diff --git a/arch/x86/kvm/kvm-asm-offsets.c b/arch/x86/kvm/kvm-asm-offsets.c
new file mode 100644
index 000000000000..9d84f2b32d7f
--- /dev/null
+++ b/arch/x86/kvm/kvm-asm-offsets.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Generate definitions needed by assembly language modules.
+ * This code generates raw asm output which is post-processed to extract
+ * and format the required data.
+ */
+#define COMPILE_OFFSETS
+
+#include <linux/kbuild.h>
+#include "vmx/vmx.h"
+
+static void __used common(void)
+{
+ if (IS_ENABLED(CONFIG_KVM_INTEL)) {
+ BLANK();
+ OFFSET(VMX_spec_ctrl, vcpu_vmx, spec_ctrl);
+ }
+}
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 8477d8bdd69c..0b5db4de4d09 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -1,12 +1,12 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <linux/linkage.h>
#include <asm/asm.h>
-#include <asm/asm-offsets.h>
#include <asm/bitsperlong.h>
#include <asm/kvm_vcpu_regs.h>
#include <asm/nospec-branch.h>
#include <asm/percpu.h>
#include <asm/segment.h>
+#include "kvm-asm-offsets.h"
#include "run_flags.h"
#define WORD_SIZE (BITS_PER_LONG / 8)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8631ef59b622 ("KVM: x86/pmu: Do not speculatively query Intel GP PMCs that don't exist yet")
902caeb6841a ("KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS")
8183a538cd95 ("KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS")
6ebe44366bde ("KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter")
79f3e3b58386 ("KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter")
c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
0d23dc34a7ce ("x86/perf/core: Add pebs_capable to store valid PEBS_COUNTER_MASK value")
2c985527dd8d ("KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter")
39a4d779546a ("perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values")
38904911e864 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8631ef59b62290c7d88e7209e35dfb47f33f4902 Mon Sep 17 00:00:00 2001
From: Like Xu <likexu(a)tencent.com>
Date: Mon, 19 Sep 2022 17:10:06 +0800
Subject: [PATCH] KVM: x86/pmu: Do not speculatively query Intel GP PMCs that
don't exist yet
The SDM lists an architectural MSR IA32_CORE_CAPABILITIES (0xCF)
that limits the theoretical maximum value of the Intel GP PMC MSRs
allocated at 0xC1 to 14; likewise the Intel April 2022 SDM adds
IA32_OVERCLOCKING_STATUS at 0x195 which limits the number of event
selection MSRs to 15 (0x186-0x194).
Limiting the maximum number of counters to 14 or 18 based on the currently
allocated MSRs is clearly fragile, and it seems likely that Intel will
even place PMCs 8-15 at a completely different range of MSR indices.
So stop at the maximum number of GP PMCs supported today on Intel
processors.
There are some machines, like Intel P4 with non Architectural PMU, that
may indeed have 18 counters, but those counters are in a completely
different MSR address range and are not supported by KVM.
Cc: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: cf05a67b68b8 ("KVM: x86: omit "impossible" pmu MSRs from MSR list")
Suggested-by: Jim Mattson <jmattson(a)google.com>
Signed-off-by: Like Xu <likexu(a)tencent.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Message-Id: <20220919091008.60695-1-likexu(a)tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5f5eb577d583..73716fab120f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1442,20 +1442,10 @@ static const u32 msrs_to_save_all[] = {
MSR_ARCH_PERFMON_PERFCTR0 + 2, MSR_ARCH_PERFMON_PERFCTR0 + 3,
MSR_ARCH_PERFMON_PERFCTR0 + 4, MSR_ARCH_PERFMON_PERFCTR0 + 5,
MSR_ARCH_PERFMON_PERFCTR0 + 6, MSR_ARCH_PERFMON_PERFCTR0 + 7,
- MSR_ARCH_PERFMON_PERFCTR0 + 8, MSR_ARCH_PERFMON_PERFCTR0 + 9,
- MSR_ARCH_PERFMON_PERFCTR0 + 10, MSR_ARCH_PERFMON_PERFCTR0 + 11,
- MSR_ARCH_PERFMON_PERFCTR0 + 12, MSR_ARCH_PERFMON_PERFCTR0 + 13,
- MSR_ARCH_PERFMON_PERFCTR0 + 14, MSR_ARCH_PERFMON_PERFCTR0 + 15,
- MSR_ARCH_PERFMON_PERFCTR0 + 16, MSR_ARCH_PERFMON_PERFCTR0 + 17,
MSR_ARCH_PERFMON_EVENTSEL0, MSR_ARCH_PERFMON_EVENTSEL1,
MSR_ARCH_PERFMON_EVENTSEL0 + 2, MSR_ARCH_PERFMON_EVENTSEL0 + 3,
MSR_ARCH_PERFMON_EVENTSEL0 + 4, MSR_ARCH_PERFMON_EVENTSEL0 + 5,
MSR_ARCH_PERFMON_EVENTSEL0 + 6, MSR_ARCH_PERFMON_EVENTSEL0 + 7,
- MSR_ARCH_PERFMON_EVENTSEL0 + 8, MSR_ARCH_PERFMON_EVENTSEL0 + 9,
- MSR_ARCH_PERFMON_EVENTSEL0 + 10, MSR_ARCH_PERFMON_EVENTSEL0 + 11,
- MSR_ARCH_PERFMON_EVENTSEL0 + 12, MSR_ARCH_PERFMON_EVENTSEL0 + 13,
- MSR_ARCH_PERFMON_EVENTSEL0 + 14, MSR_ARCH_PERFMON_EVENTSEL0 + 15,
- MSR_ARCH_PERFMON_EVENTSEL0 + 16, MSR_ARCH_PERFMON_EVENTSEL0 + 17,
MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
MSR_K7_EVNTSEL0, MSR_K7_EVNTSEL1, MSR_K7_EVNTSEL2, MSR_K7_EVNTSEL3,
@@ -7041,12 +7031,12 @@ static void kvm_init_msr_list(void)
intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2)
continue;
break;
- case MSR_ARCH_PERFMON_PERFCTR0 ... MSR_ARCH_PERFMON_PERFCTR0 + 17:
+ case MSR_ARCH_PERFMON_PERFCTR0 ... MSR_ARCH_PERFMON_PERFCTR0 + 7:
if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_PERFCTR0 >=
min(INTEL_PMC_MAX_GENERIC, kvm_pmu_cap.num_counters_gp))
continue;
break;
- case MSR_ARCH_PERFMON_EVENTSEL0 ... MSR_ARCH_PERFMON_EVENTSEL0 + 17:
+ case MSR_ARCH_PERFMON_EVENTSEL0 ... MSR_ARCH_PERFMON_EVENTSEL0 + 7:
if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >=
min(INTEL_PMC_MAX_GENERIC, kvm_pmu_cap.num_counters_gp))
continue;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8631ef59b622 ("KVM: x86/pmu: Do not speculatively query Intel GP PMCs that don't exist yet")
902caeb6841a ("KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS")
8183a538cd95 ("KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS")
6ebe44366bde ("KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter")
79f3e3b58386 ("KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter")
c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
0d23dc34a7ce ("x86/perf/core: Add pebs_capable to store valid PEBS_COUNTER_MASK value")
2c985527dd8d ("KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter")
39a4d779546a ("perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values")
38904911e864 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8631ef59b62290c7d88e7209e35dfb47f33f4902 Mon Sep 17 00:00:00 2001
From: Like Xu <likexu(a)tencent.com>
Date: Mon, 19 Sep 2022 17:10:06 +0800
Subject: [PATCH] KVM: x86/pmu: Do not speculatively query Intel GP PMCs that
don't exist yet
The SDM lists an architectural MSR IA32_CORE_CAPABILITIES (0xCF)
that limits the theoretical maximum value of the Intel GP PMC MSRs
allocated at 0xC1 to 14; likewise the Intel April 2022 SDM adds
IA32_OVERCLOCKING_STATUS at 0x195 which limits the number of event
selection MSRs to 15 (0x186-0x194).
Limiting the maximum number of counters to 14 or 18 based on the currently
allocated MSRs is clearly fragile, and it seems likely that Intel will
even place PMCs 8-15 at a completely different range of MSR indices.
So stop at the maximum number of GP PMCs supported today on Intel
processors.
There are some machines, like Intel P4 with non Architectural PMU, that
may indeed have 18 counters, but those counters are in a completely
different MSR address range and are not supported by KVM.
Cc: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: cf05a67b68b8 ("KVM: x86: omit "impossible" pmu MSRs from MSR list")
Suggested-by: Jim Mattson <jmattson(a)google.com>
Signed-off-by: Like Xu <likexu(a)tencent.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Message-Id: <20220919091008.60695-1-likexu(a)tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5f5eb577d583..73716fab120f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1442,20 +1442,10 @@ static const u32 msrs_to_save_all[] = {
MSR_ARCH_PERFMON_PERFCTR0 + 2, MSR_ARCH_PERFMON_PERFCTR0 + 3,
MSR_ARCH_PERFMON_PERFCTR0 + 4, MSR_ARCH_PERFMON_PERFCTR0 + 5,
MSR_ARCH_PERFMON_PERFCTR0 + 6, MSR_ARCH_PERFMON_PERFCTR0 + 7,
- MSR_ARCH_PERFMON_PERFCTR0 + 8, MSR_ARCH_PERFMON_PERFCTR0 + 9,
- MSR_ARCH_PERFMON_PERFCTR0 + 10, MSR_ARCH_PERFMON_PERFCTR0 + 11,
- MSR_ARCH_PERFMON_PERFCTR0 + 12, MSR_ARCH_PERFMON_PERFCTR0 + 13,
- MSR_ARCH_PERFMON_PERFCTR0 + 14, MSR_ARCH_PERFMON_PERFCTR0 + 15,
- MSR_ARCH_PERFMON_PERFCTR0 + 16, MSR_ARCH_PERFMON_PERFCTR0 + 17,
MSR_ARCH_PERFMON_EVENTSEL0, MSR_ARCH_PERFMON_EVENTSEL1,
MSR_ARCH_PERFMON_EVENTSEL0 + 2, MSR_ARCH_PERFMON_EVENTSEL0 + 3,
MSR_ARCH_PERFMON_EVENTSEL0 + 4, MSR_ARCH_PERFMON_EVENTSEL0 + 5,
MSR_ARCH_PERFMON_EVENTSEL0 + 6, MSR_ARCH_PERFMON_EVENTSEL0 + 7,
- MSR_ARCH_PERFMON_EVENTSEL0 + 8, MSR_ARCH_PERFMON_EVENTSEL0 + 9,
- MSR_ARCH_PERFMON_EVENTSEL0 + 10, MSR_ARCH_PERFMON_EVENTSEL0 + 11,
- MSR_ARCH_PERFMON_EVENTSEL0 + 12, MSR_ARCH_PERFMON_EVENTSEL0 + 13,
- MSR_ARCH_PERFMON_EVENTSEL0 + 14, MSR_ARCH_PERFMON_EVENTSEL0 + 15,
- MSR_ARCH_PERFMON_EVENTSEL0 + 16, MSR_ARCH_PERFMON_EVENTSEL0 + 17,
MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
MSR_K7_EVNTSEL0, MSR_K7_EVNTSEL1, MSR_K7_EVNTSEL2, MSR_K7_EVNTSEL3,
@@ -7041,12 +7031,12 @@ static void kvm_init_msr_list(void)
intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2)
continue;
break;
- case MSR_ARCH_PERFMON_PERFCTR0 ... MSR_ARCH_PERFMON_PERFCTR0 + 17:
+ case MSR_ARCH_PERFMON_PERFCTR0 ... MSR_ARCH_PERFMON_PERFCTR0 + 7:
if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_PERFCTR0 >=
min(INTEL_PMC_MAX_GENERIC, kvm_pmu_cap.num_counters_gp))
continue;
break;
- case MSR_ARCH_PERFMON_EVENTSEL0 ... MSR_ARCH_PERFMON_EVENTSEL0 + 17:
+ case MSR_ARCH_PERFMON_EVENTSEL0 ... MSR_ARCH_PERFMON_EVENTSEL0 + 7:
if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >=
min(INTEL_PMC_MAX_GENERIC, kvm_pmu_cap.num_counters_gp))
continue;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8631ef59b622 ("KVM: x86/pmu: Do not speculatively query Intel GP PMCs that don't exist yet")
902caeb6841a ("KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS")
8183a538cd95 ("KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS")
6ebe44366bde ("KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter")
79f3e3b58386 ("KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter")
c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
0d23dc34a7ce ("x86/perf/core: Add pebs_capable to store valid PEBS_COUNTER_MASK value")
2c985527dd8d ("KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter")
39a4d779546a ("perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values")
38904911e864 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8631ef59b62290c7d88e7209e35dfb47f33f4902 Mon Sep 17 00:00:00 2001
From: Like Xu <likexu(a)tencent.com>
Date: Mon, 19 Sep 2022 17:10:06 +0800
Subject: [PATCH] KVM: x86/pmu: Do not speculatively query Intel GP PMCs that
don't exist yet
The SDM lists an architectural MSR IA32_CORE_CAPABILITIES (0xCF)
that limits the theoretical maximum value of the Intel GP PMC MSRs
allocated at 0xC1 to 14; likewise the Intel April 2022 SDM adds
IA32_OVERCLOCKING_STATUS at 0x195 which limits the number of event
selection MSRs to 15 (0x186-0x194).
Limiting the maximum number of counters to 14 or 18 based on the currently
allocated MSRs is clearly fragile, and it seems likely that Intel will
even place PMCs 8-15 at a completely different range of MSR indices.
So stop at the maximum number of GP PMCs supported today on Intel
processors.
There are some machines, like Intel P4 with non Architectural PMU, that
may indeed have 18 counters, but those counters are in a completely
different MSR address range and are not supported by KVM.
Cc: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: cf05a67b68b8 ("KVM: x86: omit "impossible" pmu MSRs from MSR list")
Suggested-by: Jim Mattson <jmattson(a)google.com>
Signed-off-by: Like Xu <likexu(a)tencent.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Message-Id: <20220919091008.60695-1-likexu(a)tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5f5eb577d583..73716fab120f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1442,20 +1442,10 @@ static const u32 msrs_to_save_all[] = {
MSR_ARCH_PERFMON_PERFCTR0 + 2, MSR_ARCH_PERFMON_PERFCTR0 + 3,
MSR_ARCH_PERFMON_PERFCTR0 + 4, MSR_ARCH_PERFMON_PERFCTR0 + 5,
MSR_ARCH_PERFMON_PERFCTR0 + 6, MSR_ARCH_PERFMON_PERFCTR0 + 7,
- MSR_ARCH_PERFMON_PERFCTR0 + 8, MSR_ARCH_PERFMON_PERFCTR0 + 9,
- MSR_ARCH_PERFMON_PERFCTR0 + 10, MSR_ARCH_PERFMON_PERFCTR0 + 11,
- MSR_ARCH_PERFMON_PERFCTR0 + 12, MSR_ARCH_PERFMON_PERFCTR0 + 13,
- MSR_ARCH_PERFMON_PERFCTR0 + 14, MSR_ARCH_PERFMON_PERFCTR0 + 15,
- MSR_ARCH_PERFMON_PERFCTR0 + 16, MSR_ARCH_PERFMON_PERFCTR0 + 17,
MSR_ARCH_PERFMON_EVENTSEL0, MSR_ARCH_PERFMON_EVENTSEL1,
MSR_ARCH_PERFMON_EVENTSEL0 + 2, MSR_ARCH_PERFMON_EVENTSEL0 + 3,
MSR_ARCH_PERFMON_EVENTSEL0 + 4, MSR_ARCH_PERFMON_EVENTSEL0 + 5,
MSR_ARCH_PERFMON_EVENTSEL0 + 6, MSR_ARCH_PERFMON_EVENTSEL0 + 7,
- MSR_ARCH_PERFMON_EVENTSEL0 + 8, MSR_ARCH_PERFMON_EVENTSEL0 + 9,
- MSR_ARCH_PERFMON_EVENTSEL0 + 10, MSR_ARCH_PERFMON_EVENTSEL0 + 11,
- MSR_ARCH_PERFMON_EVENTSEL0 + 12, MSR_ARCH_PERFMON_EVENTSEL0 + 13,
- MSR_ARCH_PERFMON_EVENTSEL0 + 14, MSR_ARCH_PERFMON_EVENTSEL0 + 15,
- MSR_ARCH_PERFMON_EVENTSEL0 + 16, MSR_ARCH_PERFMON_EVENTSEL0 + 17,
MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
MSR_K7_EVNTSEL0, MSR_K7_EVNTSEL1, MSR_K7_EVNTSEL2, MSR_K7_EVNTSEL3,
@@ -7041,12 +7031,12 @@ static void kvm_init_msr_list(void)
intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2)
continue;
break;
- case MSR_ARCH_PERFMON_PERFCTR0 ... MSR_ARCH_PERFMON_PERFCTR0 + 17:
+ case MSR_ARCH_PERFMON_PERFCTR0 ... MSR_ARCH_PERFMON_PERFCTR0 + 7:
if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_PERFCTR0 >=
min(INTEL_PMC_MAX_GENERIC, kvm_pmu_cap.num_counters_gp))
continue;
break;
- case MSR_ARCH_PERFMON_EVENTSEL0 ... MSR_ARCH_PERFMON_EVENTSEL0 + 17:
+ case MSR_ARCH_PERFMON_EVENTSEL0 ... MSR_ARCH_PERFMON_EVENTSEL0 + 7:
if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >=
min(INTEL_PMC_MAX_GENERIC, kvm_pmu_cap.num_counters_gp))
continue;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
6d3085e4d89a ("KVM: x86/mmu: Block all page faults during kvm_zap_gfn_range()")
20ec3ebd707c ("KVM: Rename mmu_notifier_* to mmu_invalidate_*")
65e3b446bcce ("KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level()")
a8ac499bb6ab ("KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs")
9202aee816c8 ("KVM: x86/mmu: Rename pte_list_{destroy,remove}() to show they zap SPTEs")
a42989e7fbb0 ("KVM: x86/mmu: Directly "destroy" PTE list when recycling rmaps")
2ff9039a75a8 ("KVM: x86/mmu: Decouple rmap_add() and link_shadow_page() from kvm_vcpu")
6ec6509eea39 ("KVM: x86/mmu: Pass const memslot to rmap_add()")
5d49f08c2e08 ("KVM: x86/mmu: Shove refcounted page dependency into host_pfn_mapping_level()")
b14b2690c50e ("KVM: Rename/refactor kvm_is_reserved_pfn() to kvm_pfn_to_refcounted_page()")
284dc4930773 ("KVM: Take a 'struct page', not a pfn in kvm_is_zone_device_page()")
b1624f99aa8f ("KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()")
6573a6910ce4 ("KVM: Don't WARN if kvm_pfn_to_page() encounters a "reserved" pfn")
8e1c69149f27 ("KVM: Avoid pfn_to_page() and vice versa when releasing pages")
a1040b0d42ac ("KVM: Don't set Accessed/Dirty bits for ZERO_PAGE")
b31455e96f00 ("Merge branch 'kvm-5.20-early-patches' into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6d3085e4d89ad7e6c7f1c6cf929d903393565861 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 11 Nov 2022 00:18:41 +0000
Subject: [PATCH] KVM: x86/mmu: Block all page faults during
kvm_zap_gfn_range()
When zapping a GFN range, pass 0 => ALL_ONES for the to-be-invalidated
range to effectively block all page faults while the zap is in-progress.
The invalidation helpers take a host virtual address, whereas zapping a
GFN obviously provides a guest physical address and with the wrong unit
of measurement (frame vs. byte).
Alternatively, KVM could walk all memslots to get the associated HVAs,
but thanks to SMM, that would require multiple lookups. And practically
speaking, kvm_zap_gfn_range() usage is quite rare and not a hot path,
e.g. MTRR and CR0.CD are almost guaranteed to be done only on vCPU0
during boot, and APICv inhibits are similarly infrequent operations.
Fixes: edb298c663fc ("KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range")
Reported-by: Chao Peng <chao.p.peng(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20221111001841.2412598-1-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6f81539061d6..1ccb769f62af 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6056,7 +6056,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
write_lock(&kvm->mmu_lock);
- kvm_mmu_invalidate_begin(kvm, gfn_start, gfn_end);
+ kvm_mmu_invalidate_begin(kvm, 0, -1ul);
flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end);
@@ -6070,7 +6070,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
gfn_end - gfn_start);
- kvm_mmu_invalidate_end(kvm, gfn_start, gfn_end);
+ kvm_mmu_invalidate_end(kvm, 0, -1ul);
write_unlock(&kvm->mmu_lock);
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0ec8ce073944 ("dmaengine: idxd: Do not enable user type Work Queue without Shared Virtual Addressing")
403a2e236538 ("dmaengine: idxd: change MSIX allocation based on per wq activation")
23a50c803565 ("dmaengine: idxd: fix descriptor flushing locking")
ec0d64231615 ("dmaengine: idxd: embed irq_entry in idxd_wq struct")
56fc39f5a367 ("dmaengine: idxd: handle interrupt handle revoked event")
f6d442f7088c ("dmaengine: idxd: handle invalid interrupt handle descriptors")
bd5970a0d01f ("dmaengine: idxd: create locked version of idxd_quiesce() call")
46c6df1c958e ("dmaengine: idxd: add helper for per interrupt handle drain")
eb0cf33a91b4 ("dmaengine: idxd: move interrupt handle assignment")
8b67426e0558 ("dmaengine: idxd: int handle management refactoring")
5d78abb6fbc9 ("dmaengine: idxd: rework descriptor free path on failure")
a3e340c1574b ("dmaengine: idxd: fix resource leak on dmaengine driver disable")
e530a9f3db41 ("dmaengine: idxd: reconfig device after device reset command")
88d97ea82cbe ("dmaengine: idxd: add halt interrupt support")
85f604af9c83 ("dmaengine: idxd: move out percpu_ref_exit() to ensure it's outside submission")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0ec8ce07394442d722806fe61b901a5b2b17249d Mon Sep 17 00:00:00 2001
From: Fenghua Yu <fenghua.yu(a)intel.com>
Date: Fri, 14 Oct 2022 15:25:41 -0700
Subject: [PATCH] dmaengine: idxd: Do not enable user type Work Queue without
Shared Virtual Addressing
When the idxd_user_drv driver is bound to a Work Queue (WQ) device
without IOMMU or with IOMMU Passthrough without Shared Virtual
Addressing (SVA), the application gains direct access to physical
memory via the device by programming physical address to a submitted
descriptor. This allows direct userspace read and write access to
arbitrary physical memory. This is inconsistent with the security
goals of a good kernel API.
Unlike vfio_pci driver, the IDXD char device driver does not provide any
ways to pin user pages and translate the address from user VA to IOVA or
PA without IOMMU SVA. Therefore the application has no way to instruct the
device to perform DMA function. This makes the char device not usable for
normal application usage.
Since user type WQ without SVA cannot be used for normal application usage
and presents the security issue, bind idxd_user_drv driver and enable user
type WQ only when SVA is enabled (i.e. user PASID is enabled).
Fixes: 448c3de8ac83 ("dmaengine: idxd: create user driver for wq 'device'")
Cc: stable(a)vger.kernel.org
Suggested-by: Arjan Van De Ven <arjan.van.de.ven(a)intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu(a)intel.com>
Reviewed-by: Dave Jiang <dave.jiang(a)intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel(a)redhat.com>
Link: https://lore.kernel.org/r/20221014222541.3912195-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index c2808fd081d6..a9b96b18772f 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -312,6 +312,24 @@ static int idxd_user_drv_probe(struct idxd_dev *idxd_dev)
if (idxd->state != IDXD_DEV_ENABLED)
return -ENXIO;
+ /*
+ * User type WQ is enabled only when SVA is enabled for two reasons:
+ * - If no IOMMU or IOMMU Passthrough without SVA, userspace
+ * can directly access physical address through the WQ.
+ * - The IDXD cdev driver does not provide any ways to pin
+ * user pages and translate the address from user VA to IOVA or
+ * PA without IOMMU SVA. Therefore the application has no way
+ * to instruct the device to perform DMA function. This makes
+ * the cdev not usable for normal application usage.
+ */
+ if (!device_user_pasid_enabled(idxd)) {
+ idxd->cmd_status = IDXD_SCMD_WQ_USER_NO_IOMMU;
+ dev_dbg(&idxd->pdev->dev,
+ "User type WQ cannot be enabled without SVA.\n");
+
+ return -EOPNOTSUPP;
+ }
+
mutex_lock(&wq->wq_lock);
wq->type = IDXD_WQT_USER;
rc = drv_enable_wq(wq);
diff --git a/include/uapi/linux/idxd.h b/include/uapi/linux/idxd.h
index 095299c75828..2b9e7feba3f3 100644
--- a/include/uapi/linux/idxd.h
+++ b/include/uapi/linux/idxd.h
@@ -29,6 +29,7 @@ enum idxd_scmd_stat {
IDXD_SCMD_WQ_NO_SIZE = 0x800e0000,
IDXD_SCMD_WQ_NO_PRIV = 0x800f0000,
IDXD_SCMD_WQ_IRQ_ERR = 0x80100000,
+ IDXD_SCMD_WQ_USER_NO_IOMMU = 0x80110000,
};
#define IDXD_SCMD_SOFTERR_MASK 0x80000000
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8625147cafaa ("hugetlbfs: don't delete error page from pagecache")
7e1813d48dd3 ("hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cache")
1508062ecd55 ("hugetlbfs: Convert remove_inode_hugepages() to use filemap_get_folios()")
d9ef44de5d73 ("hugetlb: Convert huge_add_to_page_cache() to use a folio")
dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling")
ea6d0630100b ("mm/hwpoison: do not lock page again when me_huge_page() successfully recovers")
171936ddaf97 ("mm/memory-failure: use a mutex to avoid memory_failure() races")
e32905e57358 ("userfaultfd: hugetlbfs: fix new flag usage in error path")
15b836536321 ("mm/hugetlb: remove unused variable pseudo_vma in remove_inode_hugepages()")
d4241a049ac0 ("mm/hugetlb: avoid calculating fault_mutex_hash in truncate_op case")
d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags")
585fc0d2871c ("mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page")
a8b2c2ce89d4 ("mm,hwpoison: take free pages off the buddy freelists")
5d1fd5dc877b ("mm,hwpoison: introduce MF_MSG_UNSPLIT_THP")
694bf0b0cdf9 ("mm,hwpoison: unify THP handling for hard and soft offline")
dd6e2402fad9 ("mm,hwpoison: kill put_hwpoison_page")
7e27f22c9e40 ("mm,hwpoison: unexport get_hwpoison_page and make it static")
bbe88753bd42 ("mm/hugetlb: make hugetlb migration callback CMA aware")
41b4dc14ee80 ("mm/gup: restrict CMA region by using allocation scope API")
19fc7bed252c ("mm/migrate: introduce a standard migration target allocation function")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8625147cafaa9ba74713d682f5185eb62cb2aedb Mon Sep 17 00:00:00 2001
From: James Houghton <jthoughton(a)google.com>
Date: Tue, 18 Oct 2022 20:01:25 +0000
Subject: [PATCH] hugetlbfs: don't delete error page from pagecache
This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.
Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache. That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned. As [1] states,
this is effectively memory corruption.
The fix is to leave the page in the page cache. If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses. For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.
[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index dd54f67e47fd..df7772335dc0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -328,6 +328,12 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
} else {
unlock_page(page);
+ if (PageHWPoison(page)) {
+ put_page(page);
+ retval = -EIO;
+ break;
+ }
+
/*
* We have the page, copy it to user space buffer.
*/
@@ -1111,13 +1117,6 @@ static int hugetlbfs_migrate_folio(struct address_space *mapping,
static int hugetlbfs_error_remove_page(struct address_space *mapping,
struct page *page)
{
- struct inode *inode = mapping->host;
- pgoff_t index = page->index;
-
- hugetlb_delete_from_page_cache(page);
- if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
- hugetlb_fix_reserve_counts(inode);
-
return 0;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 546df97c31e4..e48f8ef45b17 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6111,6 +6111,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
ptl = huge_pte_lock(h, dst_mm, dst_pte);
+ ret = -EIO;
+ if (PageHWPoison(page))
+ goto out_release_unlock;
+
/*
* We allow to overwrite a pte marker: consider when both MISSING|WP
* registered, we firstly wr-protect a none pte which has no page cache
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 145bb561ddb3..bead6bccc7f2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1080,6 +1080,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
int res;
struct page *hpage = compound_head(p);
struct address_space *mapping;
+ bool extra_pins = false;
if (!PageHuge(hpage))
return MF_DELAYED;
@@ -1087,6 +1088,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
mapping = page_mapping(hpage);
if (mapping) {
res = truncate_error_page(hpage, page_to_pfn(p), mapping);
+ /* The page is kept in page cache. */
+ extra_pins = true;
unlock_page(hpage);
} else {
unlock_page(hpage);
@@ -1104,7 +1107,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
}
}
- if (has_extra_refcount(ps, p, false))
+ if (has_extra_refcount(ps, p, extra_pins))
res = MF_FAILED;
return res;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8625147cafaa ("hugetlbfs: don't delete error page from pagecache")
7e1813d48dd3 ("hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cache")
1508062ecd55 ("hugetlbfs: Convert remove_inode_hugepages() to use filemap_get_folios()")
d9ef44de5d73 ("hugetlb: Convert huge_add_to_page_cache() to use a folio")
dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling")
ea6d0630100b ("mm/hwpoison: do not lock page again when me_huge_page() successfully recovers")
171936ddaf97 ("mm/memory-failure: use a mutex to avoid memory_failure() races")
e32905e57358 ("userfaultfd: hugetlbfs: fix new flag usage in error path")
15b836536321 ("mm/hugetlb: remove unused variable pseudo_vma in remove_inode_hugepages()")
d4241a049ac0 ("mm/hugetlb: avoid calculating fault_mutex_hash in truncate_op case")
d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags")
585fc0d2871c ("mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page")
a8b2c2ce89d4 ("mm,hwpoison: take free pages off the buddy freelists")
5d1fd5dc877b ("mm,hwpoison: introduce MF_MSG_UNSPLIT_THP")
694bf0b0cdf9 ("mm,hwpoison: unify THP handling for hard and soft offline")
dd6e2402fad9 ("mm,hwpoison: kill put_hwpoison_page")
7e27f22c9e40 ("mm,hwpoison: unexport get_hwpoison_page and make it static")
bbe88753bd42 ("mm/hugetlb: make hugetlb migration callback CMA aware")
41b4dc14ee80 ("mm/gup: restrict CMA region by using allocation scope API")
19fc7bed252c ("mm/migrate: introduce a standard migration target allocation function")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8625147cafaa9ba74713d682f5185eb62cb2aedb Mon Sep 17 00:00:00 2001
From: James Houghton <jthoughton(a)google.com>
Date: Tue, 18 Oct 2022 20:01:25 +0000
Subject: [PATCH] hugetlbfs: don't delete error page from pagecache
This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.
Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache. That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned. As [1] states,
this is effectively memory corruption.
The fix is to leave the page in the page cache. If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses. For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.
[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index dd54f67e47fd..df7772335dc0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -328,6 +328,12 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
} else {
unlock_page(page);
+ if (PageHWPoison(page)) {
+ put_page(page);
+ retval = -EIO;
+ break;
+ }
+
/*
* We have the page, copy it to user space buffer.
*/
@@ -1111,13 +1117,6 @@ static int hugetlbfs_migrate_folio(struct address_space *mapping,
static int hugetlbfs_error_remove_page(struct address_space *mapping,
struct page *page)
{
- struct inode *inode = mapping->host;
- pgoff_t index = page->index;
-
- hugetlb_delete_from_page_cache(page);
- if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
- hugetlb_fix_reserve_counts(inode);
-
return 0;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 546df97c31e4..e48f8ef45b17 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6111,6 +6111,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
ptl = huge_pte_lock(h, dst_mm, dst_pte);
+ ret = -EIO;
+ if (PageHWPoison(page))
+ goto out_release_unlock;
+
/*
* We allow to overwrite a pte marker: consider when both MISSING|WP
* registered, we firstly wr-protect a none pte which has no page cache
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 145bb561ddb3..bead6bccc7f2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1080,6 +1080,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
int res;
struct page *hpage = compound_head(p);
struct address_space *mapping;
+ bool extra_pins = false;
if (!PageHuge(hpage))
return MF_DELAYED;
@@ -1087,6 +1088,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
mapping = page_mapping(hpage);
if (mapping) {
res = truncate_error_page(hpage, page_to_pfn(p), mapping);
+ /* The page is kept in page cache. */
+ extra_pins = true;
unlock_page(hpage);
} else {
unlock_page(hpage);
@@ -1104,7 +1107,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
}
}
- if (has_extra_refcount(ps, p, false))
+ if (has_extra_refcount(ps, p, extra_pins))
res = MF_FAILED;
return res;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8625147cafaa ("hugetlbfs: don't delete error page from pagecache")
7e1813d48dd3 ("hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cache")
1508062ecd55 ("hugetlbfs: Convert remove_inode_hugepages() to use filemap_get_folios()")
d9ef44de5d73 ("hugetlb: Convert huge_add_to_page_cache() to use a folio")
dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling")
ea6d0630100b ("mm/hwpoison: do not lock page again when me_huge_page() successfully recovers")
171936ddaf97 ("mm/memory-failure: use a mutex to avoid memory_failure() races")
e32905e57358 ("userfaultfd: hugetlbfs: fix new flag usage in error path")
15b836536321 ("mm/hugetlb: remove unused variable pseudo_vma in remove_inode_hugepages()")
d4241a049ac0 ("mm/hugetlb: avoid calculating fault_mutex_hash in truncate_op case")
d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags")
585fc0d2871c ("mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page")
a8b2c2ce89d4 ("mm,hwpoison: take free pages off the buddy freelists")
5d1fd5dc877b ("mm,hwpoison: introduce MF_MSG_UNSPLIT_THP")
694bf0b0cdf9 ("mm,hwpoison: unify THP handling for hard and soft offline")
dd6e2402fad9 ("mm,hwpoison: kill put_hwpoison_page")
7e27f22c9e40 ("mm,hwpoison: unexport get_hwpoison_page and make it static")
bbe88753bd42 ("mm/hugetlb: make hugetlb migration callback CMA aware")
41b4dc14ee80 ("mm/gup: restrict CMA region by using allocation scope API")
19fc7bed252c ("mm/migrate: introduce a standard migration target allocation function")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8625147cafaa9ba74713d682f5185eb62cb2aedb Mon Sep 17 00:00:00 2001
From: James Houghton <jthoughton(a)google.com>
Date: Tue, 18 Oct 2022 20:01:25 +0000
Subject: [PATCH] hugetlbfs: don't delete error page from pagecache
This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.
Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache. That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned. As [1] states,
this is effectively memory corruption.
The fix is to leave the page in the page cache. If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses. For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.
[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index dd54f67e47fd..df7772335dc0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -328,6 +328,12 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
} else {
unlock_page(page);
+ if (PageHWPoison(page)) {
+ put_page(page);
+ retval = -EIO;
+ break;
+ }
+
/*
* We have the page, copy it to user space buffer.
*/
@@ -1111,13 +1117,6 @@ static int hugetlbfs_migrate_folio(struct address_space *mapping,
static int hugetlbfs_error_remove_page(struct address_space *mapping,
struct page *page)
{
- struct inode *inode = mapping->host;
- pgoff_t index = page->index;
-
- hugetlb_delete_from_page_cache(page);
- if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
- hugetlb_fix_reserve_counts(inode);
-
return 0;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 546df97c31e4..e48f8ef45b17 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6111,6 +6111,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
ptl = huge_pte_lock(h, dst_mm, dst_pte);
+ ret = -EIO;
+ if (PageHWPoison(page))
+ goto out_release_unlock;
+
/*
* We allow to overwrite a pte marker: consider when both MISSING|WP
* registered, we firstly wr-protect a none pte which has no page cache
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 145bb561ddb3..bead6bccc7f2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1080,6 +1080,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
int res;
struct page *hpage = compound_head(p);
struct address_space *mapping;
+ bool extra_pins = false;
if (!PageHuge(hpage))
return MF_DELAYED;
@@ -1087,6 +1088,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
mapping = page_mapping(hpage);
if (mapping) {
res = truncate_error_page(hpage, page_to_pfn(p), mapping);
+ /* The page is kept in page cache. */
+ extra_pins = true;
unlock_page(hpage);
} else {
unlock_page(hpage);
@@ -1104,7 +1107,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
}
}
- if (has_extra_refcount(ps, p, false))
+ if (has_extra_refcount(ps, p, extra_pins))
res = MF_FAILED;
return res;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8625147cafaa ("hugetlbfs: don't delete error page from pagecache")
7e1813d48dd3 ("hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cache")
1508062ecd55 ("hugetlbfs: Convert remove_inode_hugepages() to use filemap_get_folios()")
d9ef44de5d73 ("hugetlb: Convert huge_add_to_page_cache() to use a folio")
dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling")
ea6d0630100b ("mm/hwpoison: do not lock page again when me_huge_page() successfully recovers")
171936ddaf97 ("mm/memory-failure: use a mutex to avoid memory_failure() races")
e32905e57358 ("userfaultfd: hugetlbfs: fix new flag usage in error path")
15b836536321 ("mm/hugetlb: remove unused variable pseudo_vma in remove_inode_hugepages()")
d4241a049ac0 ("mm/hugetlb: avoid calculating fault_mutex_hash in truncate_op case")
d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags")
585fc0d2871c ("mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page")
a8b2c2ce89d4 ("mm,hwpoison: take free pages off the buddy freelists")
5d1fd5dc877b ("mm,hwpoison: introduce MF_MSG_UNSPLIT_THP")
694bf0b0cdf9 ("mm,hwpoison: unify THP handling for hard and soft offline")
dd6e2402fad9 ("mm,hwpoison: kill put_hwpoison_page")
7e27f22c9e40 ("mm,hwpoison: unexport get_hwpoison_page and make it static")
bbe88753bd42 ("mm/hugetlb: make hugetlb migration callback CMA aware")
41b4dc14ee80 ("mm/gup: restrict CMA region by using allocation scope API")
19fc7bed252c ("mm/migrate: introduce a standard migration target allocation function")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8625147cafaa9ba74713d682f5185eb62cb2aedb Mon Sep 17 00:00:00 2001
From: James Houghton <jthoughton(a)google.com>
Date: Tue, 18 Oct 2022 20:01:25 +0000
Subject: [PATCH] hugetlbfs: don't delete error page from pagecache
This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.
Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache. That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned. As [1] states,
this is effectively memory corruption.
The fix is to leave the page in the page cache. If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses. For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.
[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index dd54f67e47fd..df7772335dc0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -328,6 +328,12 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
} else {
unlock_page(page);
+ if (PageHWPoison(page)) {
+ put_page(page);
+ retval = -EIO;
+ break;
+ }
+
/*
* We have the page, copy it to user space buffer.
*/
@@ -1111,13 +1117,6 @@ static int hugetlbfs_migrate_folio(struct address_space *mapping,
static int hugetlbfs_error_remove_page(struct address_space *mapping,
struct page *page)
{
- struct inode *inode = mapping->host;
- pgoff_t index = page->index;
-
- hugetlb_delete_from_page_cache(page);
- if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
- hugetlb_fix_reserve_counts(inode);
-
return 0;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 546df97c31e4..e48f8ef45b17 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6111,6 +6111,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
ptl = huge_pte_lock(h, dst_mm, dst_pte);
+ ret = -EIO;
+ if (PageHWPoison(page))
+ goto out_release_unlock;
+
/*
* We allow to overwrite a pte marker: consider when both MISSING|WP
* registered, we firstly wr-protect a none pte which has no page cache
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 145bb561ddb3..bead6bccc7f2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1080,6 +1080,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
int res;
struct page *hpage = compound_head(p);
struct address_space *mapping;
+ bool extra_pins = false;
if (!PageHuge(hpage))
return MF_DELAYED;
@@ -1087,6 +1088,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
mapping = page_mapping(hpage);
if (mapping) {
res = truncate_error_page(hpage, page_to_pfn(p), mapping);
+ /* The page is kept in page cache. */
+ extra_pins = true;
unlock_page(hpage);
} else {
unlock_page(hpage);
@@ -1104,7 +1107,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
}
}
- if (has_extra_refcount(ps, p, false))
+ if (has_extra_refcount(ps, p, extra_pins))
res = MF_FAILED;
return res;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8625147cafaa ("hugetlbfs: don't delete error page from pagecache")
7e1813d48dd3 ("hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cache")
1508062ecd55 ("hugetlbfs: Convert remove_inode_hugepages() to use filemap_get_folios()")
d9ef44de5d73 ("hugetlb: Convert huge_add_to_page_cache() to use a folio")
dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling")
ea6d0630100b ("mm/hwpoison: do not lock page again when me_huge_page() successfully recovers")
171936ddaf97 ("mm/memory-failure: use a mutex to avoid memory_failure() races")
e32905e57358 ("userfaultfd: hugetlbfs: fix new flag usage in error path")
15b836536321 ("mm/hugetlb: remove unused variable pseudo_vma in remove_inode_hugepages()")
d4241a049ac0 ("mm/hugetlb: avoid calculating fault_mutex_hash in truncate_op case")
d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags")
585fc0d2871c ("mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page")
a8b2c2ce89d4 ("mm,hwpoison: take free pages off the buddy freelists")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8625147cafaa9ba74713d682f5185eb62cb2aedb Mon Sep 17 00:00:00 2001
From: James Houghton <jthoughton(a)google.com>
Date: Tue, 18 Oct 2022 20:01:25 +0000
Subject: [PATCH] hugetlbfs: don't delete error page from pagecache
This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.
Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache. That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned. As [1] states,
this is effectively memory corruption.
The fix is to leave the page in the page cache. If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses. For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.
[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index dd54f67e47fd..df7772335dc0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -328,6 +328,12 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
} else {
unlock_page(page);
+ if (PageHWPoison(page)) {
+ put_page(page);
+ retval = -EIO;
+ break;
+ }
+
/*
* We have the page, copy it to user space buffer.
*/
@@ -1111,13 +1117,6 @@ static int hugetlbfs_migrate_folio(struct address_space *mapping,
static int hugetlbfs_error_remove_page(struct address_space *mapping,
struct page *page)
{
- struct inode *inode = mapping->host;
- pgoff_t index = page->index;
-
- hugetlb_delete_from_page_cache(page);
- if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
- hugetlb_fix_reserve_counts(inode);
-
return 0;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 546df97c31e4..e48f8ef45b17 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6111,6 +6111,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
ptl = huge_pte_lock(h, dst_mm, dst_pte);
+ ret = -EIO;
+ if (PageHWPoison(page))
+ goto out_release_unlock;
+
/*
* We allow to overwrite a pte marker: consider when both MISSING|WP
* registered, we firstly wr-protect a none pte which has no page cache
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 145bb561ddb3..bead6bccc7f2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1080,6 +1080,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
int res;
struct page *hpage = compound_head(p);
struct address_space *mapping;
+ bool extra_pins = false;
if (!PageHuge(hpage))
return MF_DELAYED;
@@ -1087,6 +1088,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
mapping = page_mapping(hpage);
if (mapping) {
res = truncate_error_page(hpage, page_to_pfn(p), mapping);
+ /* The page is kept in page cache. */
+ extra_pins = true;
unlock_page(hpage);
} else {
unlock_page(hpage);
@@ -1104,7 +1107,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
}
}
- if (has_extra_refcount(ps, p, false))
+ if (has_extra_refcount(ps, p, extra_pins))
res = MF_FAILED;
return res;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8625147cafaa ("hugetlbfs: don't delete error page from pagecache")
7e1813d48dd3 ("hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cache")
1508062ecd55 ("hugetlbfs: Convert remove_inode_hugepages() to use filemap_get_folios()")
d9ef44de5d73 ("hugetlb: Convert huge_add_to_page_cache() to use a folio")
dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8625147cafaa9ba74713d682f5185eb62cb2aedb Mon Sep 17 00:00:00 2001
From: James Houghton <jthoughton(a)google.com>
Date: Tue, 18 Oct 2022 20:01:25 +0000
Subject: [PATCH] hugetlbfs: don't delete error page from pagecache
This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.
Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache. That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned. As [1] states,
this is effectively memory corruption.
The fix is to leave the page in the page cache. If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses. For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.
[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index dd54f67e47fd..df7772335dc0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -328,6 +328,12 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
} else {
unlock_page(page);
+ if (PageHWPoison(page)) {
+ put_page(page);
+ retval = -EIO;
+ break;
+ }
+
/*
* We have the page, copy it to user space buffer.
*/
@@ -1111,13 +1117,6 @@ static int hugetlbfs_migrate_folio(struct address_space *mapping,
static int hugetlbfs_error_remove_page(struct address_space *mapping,
struct page *page)
{
- struct inode *inode = mapping->host;
- pgoff_t index = page->index;
-
- hugetlb_delete_from_page_cache(page);
- if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
- hugetlb_fix_reserve_counts(inode);
-
return 0;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 546df97c31e4..e48f8ef45b17 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6111,6 +6111,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
ptl = huge_pte_lock(h, dst_mm, dst_pte);
+ ret = -EIO;
+ if (PageHWPoison(page))
+ goto out_release_unlock;
+
/*
* We allow to overwrite a pte marker: consider when both MISSING|WP
* registered, we firstly wr-protect a none pte which has no page cache
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 145bb561ddb3..bead6bccc7f2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1080,6 +1080,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
int res;
struct page *hpage = compound_head(p);
struct address_space *mapping;
+ bool extra_pins = false;
if (!PageHuge(hpage))
return MF_DELAYED;
@@ -1087,6 +1088,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
mapping = page_mapping(hpage);
if (mapping) {
res = truncate_error_page(hpage, page_to_pfn(p), mapping);
+ /* The page is kept in page cache. */
+ extra_pins = true;
unlock_page(hpage);
} else {
unlock_page(hpage);
@@ -1104,7 +1107,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
}
}
- if (has_extra_refcount(ps, p, false))
+ if (has_extra_refcount(ps, p, extra_pins))
res = MF_FAILED;
return res;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
fb1dec44c675 ("mms: sdhci-esdhc-imx: Fix SDHCI_RESET_ALL for CQHCI")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb1dec44c6750bb414f47b929c8c175a1a127c31 Mon Sep 17 00:00:00 2001
From: Brian Norris <briannorris(a)chromium.org>
Date: Wed, 26 Oct 2022 12:42:06 -0700
Subject: [PATCH] mms: sdhci-esdhc-imx: Fix SDHCI_RESET_ALL for CQHCI
[[ NOTE: this is completely untested by the author, but included solely
because, as noted in commit df57d73276b8 ("mmc: sdhci-pci: Fix
SDHCI_RESET_ALL for CQHCI for Intel GLK-based controllers"), "other
drivers using CQHCI might benefit from a similar change, if they
also have CQHCI reset by SDHCI_RESET_ALL." We've now seen the same
bug on at least MSM, Arasan, and Intel hardware. ]]
SDHCI_RESET_ALL resets will reset the hardware CQE state, but we aren't
tracking that properly in software. When out of sync, we may trigger
various timeouts.
It's not typical to perform resets while CQE is enabled, but this may
occur in some suspend or error recovery scenarios.
Include this fix by way of the new sdhci_and_cqhci_reset() helper.
This patch depends on (and should not compile without) the patch
entitled "mmc: cqhci: Provide helper for resetting both SDHCI and
CQHCI".
Fixes: bb6e358169bf ("mmc: sdhci-esdhc-imx: add CMDQ support")
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: Haibo Chen <haibo.chen(a)nxp.com>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20221026124150.v4.4.I7d01f9ad11bacdc9213dee61b791…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c b/drivers/mmc/host/sdhci-esdhc-imx.c
index 747df79d90ee..89225faa242a 100644
--- a/drivers/mmc/host/sdhci-esdhc-imx.c
+++ b/drivers/mmc/host/sdhci-esdhc-imx.c
@@ -25,6 +25,7 @@
#include <linux/of_device.h>
#include <linux/pinctrl/consumer.h>
#include <linux/pm_runtime.h>
+#include "sdhci-cqhci.h"
#include "sdhci-pltfm.h"
#include "sdhci-esdhc.h"
#include "cqhci.h"
@@ -1288,7 +1289,7 @@ static void esdhc_set_uhs_signaling(struct sdhci_host *host, unsigned timing)
static void esdhc_reset(struct sdhci_host *host, u8 mask)
{
- sdhci_reset(host, mask);
+ sdhci_and_cqhci_reset(host, mask);
sdhci_writel(host, host->ier, SDHCI_INT_ENABLE);
sdhci_writel(host, host->ier, SDHCI_SIGNAL_ENABLE);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f90daa975911 ("drm/i915/dmabuf: fix sg_table handling in map_dma_buf")
10be98a77c55 ("drm/i915: Move more GEM objects under gem/")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f90daa975911961b65070ec72bd7dd8d448f9ef7 Mon Sep 17 00:00:00 2001
From: Matthew Auld <matthew.auld(a)intel.com>
Date: Fri, 28 Oct 2022 16:50:26 +0100
Subject: [PATCH] drm/i915/dmabuf: fix sg_table handling in map_dma_buf
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
We need to iterate over the original entries here for the sg_table,
pulling out the struct page for each one, to be remapped. However
currently this incorrectly iterates over the final dma mapped entries,
which is likely just one gigantic sg entry if the iommu is enabled,
leading to us only mapping the first struct page (and any physically
contiguous pages following it), even if there is potentially lots more
data to follow.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7306
Fixes: 1286ff739773 ("i915: add dmabuf/prime buffer sharing support.")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin(a)intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Michael J. Ruhl <michael.j.ruhl(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v3.5+
Reviewed-by: Michael J. Ruhl <michael.j.ruhl(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221028155029.494736-1-matth…
(cherry picked from commit 28d52f99bbca7227008cf580c9194c9b3516968e)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index f5062d0c6333..824971a1ceec 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -40,13 +40,13 @@ static struct sg_table *i915_gem_map_dma_buf(struct dma_buf_attachment *attachme
goto err;
}
- ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL);
+ ret = sg_alloc_table(st, obj->mm.pages->orig_nents, GFP_KERNEL);
if (ret)
goto err_free;
src = obj->mm.pages->sgl;
dst = st->sgl;
- for (i = 0; i < obj->mm.pages->nents; i++) {
+ for (i = 0; i < obj->mm.pages->orig_nents; i++) {
sg_set_page(dst, sg_page(src), src->length, 0);
dst = sg_next(dst);
src = sg_next(src);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8cccf05fe857 ("nilfs2: fix use-after-free bug of ns_writer on remount")
1751e8a6cb93 ("Rename superblock flags (MS_xyz -> SB_xyz)")
b04a23421bf6 ("Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8cccf05fe857a18ee26e20d11a8455a73ffd4efd Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Fri, 4 Nov 2022 23:29:59 +0900
Subject: [PATCH] nilfs2: fix use-after-free bug of ns_writer on remount
If a nilfs2 filesystem is downgraded to read-only due to metadata
corruption on disk and is remounted read/write, or if emergency read-only
remount is performed, detaching a log writer and synchronizing the
filesystem can be done at the same time.
In these cases, use-after-free of the log writer (hereinafter
nilfs->ns_writer) can happen as shown in the scenario below:
Task1 Task2
-------------------------------- ------------------------------
nilfs_construct_segment
nilfs_segctor_sync
init_wait
init_waitqueue_entry
add_wait_queue
schedule
nilfs_remount (R/W remount case)
nilfs_attach_log_writer
nilfs_detach_log_writer
nilfs_segctor_destroy
kfree
finish_wait
_raw_spin_lock_irqsave
__raw_spin_lock_irqsave
do_raw_spin_lock
debug_spin_lock_before <-- use-after-free
While Task1 is sleeping, nilfs->ns_writer is freed by Task2. After Task1
waked up, Task1 accesses nilfs->ns_writer which is already freed. This
scenario diagram is based on the Shigeru Yoshida's post [1].
This patch fixes the issue by not detaching nilfs->ns_writer on remount so
that this UAF race doesn't happen. Along with this change, this patch
also inserts a few necessary read-only checks with superblock instance
where only the ns_writer pointer was used to check if the filesystem is
read-only.
Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df…
Link: https://lkml.kernel.org/r/20221103141759.1836312-1-syoshida@redhat.com [1]
Link: https://lkml.kernel.org/r/20221104142959.28296-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+f816fa82f8783f7a02bb(a)syzkaller.appspotmail.com
Reported-by: Shigeru Yoshida <syoshida(a)redhat.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index b4cebad21b48..3335ef352915 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -317,7 +317,7 @@ void nilfs_relax_pressure_in_lock(struct super_block *sb)
struct the_nilfs *nilfs = sb->s_fs_info;
struct nilfs_sc_info *sci = nilfs->ns_writer;
- if (!sci || !sci->sc_flush_request)
+ if (sb_rdonly(sb) || unlikely(!sci) || !sci->sc_flush_request)
return;
set_bit(NILFS_SC_PRIOR_FLUSH, &sci->sc_flags);
@@ -2242,7 +2242,7 @@ int nilfs_construct_segment(struct super_block *sb)
struct nilfs_sc_info *sci = nilfs->ns_writer;
struct nilfs_transaction_info *ti;
- if (!sci)
+ if (sb_rdonly(sb) || unlikely(!sci))
return -EROFS;
/* A call inside transactions causes a deadlock. */
@@ -2280,7 +2280,7 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode,
struct nilfs_transaction_info ti;
int err = 0;
- if (!sci)
+ if (sb_rdonly(sb) || unlikely(!sci))
return -EROFS;
nilfs_transaction_lock(sb, &ti, 0);
@@ -2776,11 +2776,12 @@ int nilfs_attach_log_writer(struct super_block *sb, struct nilfs_root *root)
if (nilfs->ns_writer) {
/*
- * This happens if the filesystem was remounted
- * read/write after nilfs_error degenerated it into a
- * read-only mount.
+ * This happens if the filesystem is made read-only by
+ * __nilfs_error or nilfs_remount and then remounted
+ * read/write. In these cases, reuse the existing
+ * writer.
*/
- nilfs_detach_log_writer(sb);
+ return 0;
}
nilfs->ns_writer = nilfs_segctor_new(sb, root);
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index ba108f915391..6edb6e0dd61f 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1133,8 +1133,6 @@ static int nilfs_remount(struct super_block *sb, int *flags, char *data)
if ((bool)(*flags & SB_RDONLY) == sb_rdonly(sb))
goto out;
if (*flags & SB_RDONLY) {
- /* Shutting down log writer */
- nilfs_detach_log_writer(sb);
sb->s_flags |= SB_RDONLY;
/*
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8cccf05fe857 ("nilfs2: fix use-after-free bug of ns_writer on remount")
1751e8a6cb93 ("Rename superblock flags (MS_xyz -> SB_xyz)")
b04a23421bf6 ("Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8cccf05fe857a18ee26e20d11a8455a73ffd4efd Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Fri, 4 Nov 2022 23:29:59 +0900
Subject: [PATCH] nilfs2: fix use-after-free bug of ns_writer on remount
If a nilfs2 filesystem is downgraded to read-only due to metadata
corruption on disk and is remounted read/write, or if emergency read-only
remount is performed, detaching a log writer and synchronizing the
filesystem can be done at the same time.
In these cases, use-after-free of the log writer (hereinafter
nilfs->ns_writer) can happen as shown in the scenario below:
Task1 Task2
-------------------------------- ------------------------------
nilfs_construct_segment
nilfs_segctor_sync
init_wait
init_waitqueue_entry
add_wait_queue
schedule
nilfs_remount (R/W remount case)
nilfs_attach_log_writer
nilfs_detach_log_writer
nilfs_segctor_destroy
kfree
finish_wait
_raw_spin_lock_irqsave
__raw_spin_lock_irqsave
do_raw_spin_lock
debug_spin_lock_before <-- use-after-free
While Task1 is sleeping, nilfs->ns_writer is freed by Task2. After Task1
waked up, Task1 accesses nilfs->ns_writer which is already freed. This
scenario diagram is based on the Shigeru Yoshida's post [1].
This patch fixes the issue by not detaching nilfs->ns_writer on remount so
that this UAF race doesn't happen. Along with this change, this patch
also inserts a few necessary read-only checks with superblock instance
where only the ns_writer pointer was used to check if the filesystem is
read-only.
Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df…
Link: https://lkml.kernel.org/r/20221103141759.1836312-1-syoshida@redhat.com [1]
Link: https://lkml.kernel.org/r/20221104142959.28296-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+f816fa82f8783f7a02bb(a)syzkaller.appspotmail.com
Reported-by: Shigeru Yoshida <syoshida(a)redhat.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index b4cebad21b48..3335ef352915 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -317,7 +317,7 @@ void nilfs_relax_pressure_in_lock(struct super_block *sb)
struct the_nilfs *nilfs = sb->s_fs_info;
struct nilfs_sc_info *sci = nilfs->ns_writer;
- if (!sci || !sci->sc_flush_request)
+ if (sb_rdonly(sb) || unlikely(!sci) || !sci->sc_flush_request)
return;
set_bit(NILFS_SC_PRIOR_FLUSH, &sci->sc_flags);
@@ -2242,7 +2242,7 @@ int nilfs_construct_segment(struct super_block *sb)
struct nilfs_sc_info *sci = nilfs->ns_writer;
struct nilfs_transaction_info *ti;
- if (!sci)
+ if (sb_rdonly(sb) || unlikely(!sci))
return -EROFS;
/* A call inside transactions causes a deadlock. */
@@ -2280,7 +2280,7 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode,
struct nilfs_transaction_info ti;
int err = 0;
- if (!sci)
+ if (sb_rdonly(sb) || unlikely(!sci))
return -EROFS;
nilfs_transaction_lock(sb, &ti, 0);
@@ -2776,11 +2776,12 @@ int nilfs_attach_log_writer(struct super_block *sb, struct nilfs_root *root)
if (nilfs->ns_writer) {
/*
- * This happens if the filesystem was remounted
- * read/write after nilfs_error degenerated it into a
- * read-only mount.
+ * This happens if the filesystem is made read-only by
+ * __nilfs_error or nilfs_remount and then remounted
+ * read/write. In these cases, reuse the existing
+ * writer.
*/
- nilfs_detach_log_writer(sb);
+ return 0;
}
nilfs->ns_writer = nilfs_segctor_new(sb, root);
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index ba108f915391..6edb6e0dd61f 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1133,8 +1133,6 @@ static int nilfs_remount(struct super_block *sb, int *flags, char *data)
if ((bool)(*flags & SB_RDONLY) == sb_rdonly(sb))
goto out;
if (*flags & SB_RDONLY) {
- /* Shutting down log writer */
- nilfs_detach_log_writer(sb);
sb->s_flags |= SB_RDONLY;
/*
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ebb5fd38f411 ("mmc: cqhci: Provide helper for resetting both SDHCI and CQHCI")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ebb5fd38f41132e6924cb33b647337f4a5d5360c Mon Sep 17 00:00:00 2001
From: Brian Norris <briannorris(a)chromium.org>
Date: Wed, 26 Oct 2022 12:42:03 -0700
Subject: [PATCH] mmc: cqhci: Provide helper for resetting both SDHCI and CQHCI
Several SDHCI drivers need to deactivate command queueing in their reset
hook (see sdhci_cqhci_reset() / sdhci-pci-core.c, for example), and
several more are coming.
Those reset implementations have some small subtleties (e.g., ordering
of initialization of SDHCI vs. CQHCI might leave us resetting with a
NULL ->cqe_private), and are often identical across different host
drivers.
We also don't want to force a dependency between SDHCI and CQHCI, or
vice versa; non-SDHCI drivers use CQHCI, and SDHCI drivers might support
command queueing through some other means.
So, implement a small helper, to avoid repeating the same mistakes in
different drivers. Simply stick it in a header, because it's so small it
doesn't deserve its own module right now, and inlining to each driver is
pretty reasonable.
This is marked for -stable, as it is an important prerequisite patch for
several SDHCI controller bugfixes that follow.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Reviewed-by: Florian Fainelli <f.fainelli(a)gmail.com>
Link: https://lore.kernel.org/r/20221026124150.v4.1.Ie85faa09432bfe1b0890d8c24ff9…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-cqhci.h b/drivers/mmc/host/sdhci-cqhci.h
new file mode 100644
index 000000000000..cf8e7ba71bbd
--- /dev/null
+++ b/drivers/mmc/host/sdhci-cqhci.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright 2022 The Chromium OS Authors
+ *
+ * Support that applies to the combination of SDHCI and CQHCI, while not
+ * expressing a dependency between the two modules.
+ */
+
+#ifndef __MMC_HOST_SDHCI_CQHCI_H__
+#define __MMC_HOST_SDHCI_CQHCI_H__
+
+#include "cqhci.h"
+#include "sdhci.h"
+
+static inline void sdhci_and_cqhci_reset(struct sdhci_host *host, u8 mask)
+{
+ if ((host->mmc->caps2 & MMC_CAP2_CQE) && (mask & SDHCI_RESET_ALL) &&
+ host->mmc->cqe_private)
+ cqhci_deactivate(host->mmc);
+
+ sdhci_reset(host, mask);
+}
+
+#endif /* __MMC_HOST_SDHCI_CQHCI_H__ */
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5d249ac37fc2 ("mmc: sdhci-of-arasan: Fix SDHCI_RESET_ALL for CQHCI")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d249ac37fc2396e8acc1adb0650cdacae5a990d Mon Sep 17 00:00:00 2001
From: Brian Norris <briannorris(a)chromium.org>
Date: Wed, 26 Oct 2022 12:42:04 -0700
Subject: [PATCH] mmc: sdhci-of-arasan: Fix SDHCI_RESET_ALL for CQHCI
SDHCI_RESET_ALL resets will reset the hardware CQE state, but we aren't
tracking that properly in software. When out of sync, we may trigger
various timeouts.
It's not typical to perform resets while CQE is enabled, but one
particular case I hit commonly enough: mmc_suspend() -> mmc_power_off().
Typically we will eventually deactivate CQE (cqhci_suspend() ->
cqhci_deactivate()), but that's not guaranteed -- in particular, if
we perform a partial (e.g., interrupted) system suspend.
The same bug was already found and fixed for two other drivers, in v5.7
and v5.9:
5cf583f1fb9c ("mmc: sdhci-msm: Deactivate CQE during SDHC reset")
df57d73276b8 ("mmc: sdhci-pci: Fix SDHCI_RESET_ALL for CQHCI for Intel
GLK-based controllers")
The latter is especially prescient, saying "other drivers using CQHCI
might benefit from a similar change, if they also have CQHCI reset by
SDHCI_RESET_ALL."
So like these other patches, deactivate CQHCI when resetting the
controller. Do this via the new sdhci_and_cqhci_reset() helper.
This patch depends on (and should not compile without) the patch
entitled "mmc: cqhci: Provide helper for resetting both SDHCI and
CQHCI".
Fixes: 84362d79f436 ("mmc: sdhci-of-arasan: Add CQHCI support for arasan,sdhci-5.1")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: Guenter Roeck <linux(a)roeck-us.net>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Link: https://lore.kernel.org/r/20221026124150.v4.2.I29f6a2189e84e35ad89c1833793d…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-of-arasan.c b/drivers/mmc/host/sdhci-of-arasan.c
index 3997cad1f793..cfb891430174 100644
--- a/drivers/mmc/host/sdhci-of-arasan.c
+++ b/drivers/mmc/host/sdhci-of-arasan.c
@@ -25,6 +25,7 @@
#include <linux/firmware/xlnx-zynqmp.h>
#include "cqhci.h"
+#include "sdhci-cqhci.h"
#include "sdhci-pltfm.h"
#define SDHCI_ARASAN_VENDOR_REGISTER 0x78
@@ -366,7 +367,7 @@ static void sdhci_arasan_reset(struct sdhci_host *host, u8 mask)
struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
struct sdhci_arasan_data *sdhci_arasan = sdhci_pltfm_priv(pltfm_host);
- sdhci_reset(host, mask);
+ sdhci_and_cqhci_reset(host, mask);
if (sdhci_arasan->quirks & SDHCI_ARASAN_QUIRK_FORCE_CDTEST) {
ctrl = sdhci_readb(host, SDHCI_HOST_CONTROL);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f002f45a00ee ("mmc: sdhci-esdhc-imx: use the correct host caps for MMC_CAP_8_BIT_DATA")
1ed5c3b22fc7 ("mmc: sdhci-esdhc-imx: Propagate ESDHC_FLAG_HS400* only on 8bit bus")
2991ad76d253 ("mmc: sdhci-esdhc-imx: advertise HS400 mode through MMC caps")
854a22997ad5 ("mmc: sdhci-esdhc-imx: Convert the driver to DT-only")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f002f45a00ee14214d96b18b9a555fe2c56afb20 Mon Sep 17 00:00:00 2001
From: Haibo Chen <haibo.chen(a)nxp.com>
Date: Tue, 8 Nov 2022 15:45:03 +0800
Subject: [PATCH] mmc: sdhci-esdhc-imx: use the correct host caps for
MMC_CAP_8_BIT_DATA
MMC_CAP_8_BIT_DATA belongs to struct mmc_host, not struct sdhci_host.
So correct it here.
Fixes: 1ed5c3b22fc7 ("mmc: sdhci-esdhc-imx: Propagate ESDHC_FLAG_HS400* only on 8bit bus")
Signed-off-by: Haibo Chen <haibo.chen(a)nxp.com>
Cc: stable(a)vger.kernel.org
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Link: https://lore.kernel.org/r/1667893503-20583-1-git-send-email-haibo.chen@nxp.…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c b/drivers/mmc/host/sdhci-esdhc-imx.c
index 89225faa242a..31ea0a2fce35 100644
--- a/drivers/mmc/host/sdhci-esdhc-imx.c
+++ b/drivers/mmc/host/sdhci-esdhc-imx.c
@@ -1672,14 +1672,14 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev)
if (imx_data->socdata->flags & ESDHC_FLAG_ERR004536)
host->quirks |= SDHCI_QUIRK_BROKEN_ADMA;
- if (host->caps & MMC_CAP_8_BIT_DATA &&
+ if (host->mmc->caps & MMC_CAP_8_BIT_DATA &&
imx_data->socdata->flags & ESDHC_FLAG_HS400)
host->mmc->caps2 |= MMC_CAP2_HS400;
if (imx_data->socdata->flags & ESDHC_FLAG_BROKEN_AUTO_CMD23)
host->quirks2 |= SDHCI_QUIRK2_ACMD23_BROKEN;
- if (host->caps & MMC_CAP_8_BIT_DATA &&
+ if (host->mmc->caps & MMC_CAP_8_BIT_DATA &&
imx_data->socdata->flags & ESDHC_FLAG_HS400_ES) {
host->mmc->caps2 |= MMC_CAP2_HS400_ES;
host->mmc_host_ops.hs400_enhanced_strobe =
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
56baa208f910 ("mmc: sdhci-brcmstb: Fix SDHCI_RESET_ALL for CQHCI")
6bcc55fe648b ("mmc: sdhci-brcmstb: Enable Clock Gating to save power")
f3a70f991dd0 ("mmc: sdhci-brcmstb: Re-organize flags")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 56baa208f91061ff27ec2d93fbc483f624d373b4 Mon Sep 17 00:00:00 2001
From: Brian Norris <briannorris(a)chromium.org>
Date: Wed, 26 Oct 2022 12:42:05 -0700
Subject: [PATCH] mmc: sdhci-brcmstb: Fix SDHCI_RESET_ALL for CQHCI
[[ NOTE: this is completely untested by the author, but included solely
because, as noted in commit df57d73276b8 ("mmc: sdhci-pci: Fix
SDHCI_RESET_ALL for CQHCI for Intel GLK-based controllers"), "other
drivers using CQHCI might benefit from a similar change, if they
also have CQHCI reset by SDHCI_RESET_ALL." We've now seen the same
bug on at least MSM, Arasan, and Intel hardware. ]]
SDHCI_RESET_ALL resets will reset the hardware CQE state, but we aren't
tracking that properly in software. When out of sync, we may trigger
various timeouts.
It's not typical to perform resets while CQE is enabled, but this may
occur in some suspend or error recovery scenarios.
Include this fix by way of the new sdhci_and_cqhci_reset() helper.
I only patch the bcm7216 variant even though others potentially *could*
provide the 'supports-cqe' property (and thus enable CQHCI), because
d46ba2d17f90 ("mmc: sdhci-brcmstb: Add support for Command Queuing
(CQE)") and some Broadcom folks confirm that only the 7216 variant
actually supports it.
This patch depends on (and should not compile without) the patch
entitled "mmc: cqhci: Provide helper for resetting both SDHCI and
CQHCI".
Fixes: d46ba2d17f90 ("mmc: sdhci-brcmstb: Add support for Command Queuing (CQE)")
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: Florian Fainelli <f.fainelli(a)gmail.com>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20221026124150.v4.3.I6a715feab6d01f760455865e968e…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-brcmstb.c b/drivers/mmc/host/sdhci-brcmstb.c
index aff36a933ebe..55d8bd232695 100644
--- a/drivers/mmc/host/sdhci-brcmstb.c
+++ b/drivers/mmc/host/sdhci-brcmstb.c
@@ -12,6 +12,7 @@
#include <linux/bitops.h>
#include <linux/delay.h>
+#include "sdhci-cqhci.h"
#include "sdhci-pltfm.h"
#include "cqhci.h"
@@ -55,7 +56,7 @@ static void brcmstb_reset(struct sdhci_host *host, u8 mask)
struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
struct sdhci_brcmstb_priv *priv = sdhci_pltfm_priv(pltfm_host);
- sdhci_reset(host, mask);
+ sdhci_and_cqhci_reset(host, mask);
/* Reset will clear this, so re-enable it */
if (priv->flags & BRCMSTB_PRIV_FLAGS_GATE_CLOCK)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
56baa208f910 ("mmc: sdhci-brcmstb: Fix SDHCI_RESET_ALL for CQHCI")
6bcc55fe648b ("mmc: sdhci-brcmstb: Enable Clock Gating to save power")
f3a70f991dd0 ("mmc: sdhci-brcmstb: Re-organize flags")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 56baa208f91061ff27ec2d93fbc483f624d373b4 Mon Sep 17 00:00:00 2001
From: Brian Norris <briannorris(a)chromium.org>
Date: Wed, 26 Oct 2022 12:42:05 -0700
Subject: [PATCH] mmc: sdhci-brcmstb: Fix SDHCI_RESET_ALL for CQHCI
[[ NOTE: this is completely untested by the author, but included solely
because, as noted in commit df57d73276b8 ("mmc: sdhci-pci: Fix
SDHCI_RESET_ALL for CQHCI for Intel GLK-based controllers"), "other
drivers using CQHCI might benefit from a similar change, if they
also have CQHCI reset by SDHCI_RESET_ALL." We've now seen the same
bug on at least MSM, Arasan, and Intel hardware. ]]
SDHCI_RESET_ALL resets will reset the hardware CQE state, but we aren't
tracking that properly in software. When out of sync, we may trigger
various timeouts.
It's not typical to perform resets while CQE is enabled, but this may
occur in some suspend or error recovery scenarios.
Include this fix by way of the new sdhci_and_cqhci_reset() helper.
I only patch the bcm7216 variant even though others potentially *could*
provide the 'supports-cqe' property (and thus enable CQHCI), because
d46ba2d17f90 ("mmc: sdhci-brcmstb: Add support for Command Queuing
(CQE)") and some Broadcom folks confirm that only the 7216 variant
actually supports it.
This patch depends on (and should not compile without) the patch
entitled "mmc: cqhci: Provide helper for resetting both SDHCI and
CQHCI".
Fixes: d46ba2d17f90 ("mmc: sdhci-brcmstb: Add support for Command Queuing (CQE)")
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: Florian Fainelli <f.fainelli(a)gmail.com>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20221026124150.v4.3.I6a715feab6d01f760455865e968e…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-brcmstb.c b/drivers/mmc/host/sdhci-brcmstb.c
index aff36a933ebe..55d8bd232695 100644
--- a/drivers/mmc/host/sdhci-brcmstb.c
+++ b/drivers/mmc/host/sdhci-brcmstb.c
@@ -12,6 +12,7 @@
#include <linux/bitops.h>
#include <linux/delay.h>
+#include "sdhci-cqhci.h"
#include "sdhci-pltfm.h"
#include "cqhci.h"
@@ -55,7 +56,7 @@ static void brcmstb_reset(struct sdhci_host *host, u8 mask)
struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
struct sdhci_brcmstb_priv *priv = sdhci_pltfm_priv(pltfm_host);
- sdhci_reset(host, mask);
+ sdhci_and_cqhci_reset(host, mask);
/* Reset will clear this, so re-enable it */
if (priv->flags & BRCMSTB_PRIV_FLAGS_GATE_CLOCK)
Hello
I will like to use the liberty of this medium to inform you as a consultant,that my principal is interested in investing his bond/funds as a silent business partner in your company.Taking into proper
consideration the Return on Investment(ROI) based on a ten (10) year strategic plan.
I shall give you details when you reply.
Regards,
[Public]
Hi,
One more fix was just merged for the DRM buddy regression from 5.19.
This was merged before it got confirmed successful by some of the reporters, so it didn't
get the stable tag added directly.
e0b26b948246 ("drm/amdgpu: Fix the lpfn checking condition in drm buddy")
Can you please take it to 6.0.y? Here are some other tags you can add to it when committed.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2213
Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
Thanks,
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2081b3bd0c11 ("arm64: fix rodata=full again")
b9dd04a20f81 ("arm64/mm: fold check for KFENCE into can_set_direct_map()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2081b3bd0c11757725dcab9ba5d38e1bddb03459 Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb(a)kernel.org>
Date: Thu, 3 Nov 2022 18:00:15 +0100
Subject: [PATCH] arm64: fix rodata=full again
Commit 2e8cff0a0eee87b2 ("arm64: fix rodata=full") addressed a couple of
issues with the rodata= kernel command line option, which is not a
simple boolean on arm64, and inadvertently got broken due to changes in
the generic bool handling.
Unfortunately, the resulting code never clears the rodata_full boolean
variable if it defaults to true and rodata=on or rodata=off is passed,
as the generic code is not aware of the existence of this variable.
Given the way this code is plumbed together, clearing rodata_full when
returning false from arch_parse_debug_rodata() may result in
inconsistencies if the generic code decides that it cannot parse the
right hand side, so the best way to deal with this is to only take
rodata_full in account if rodata_enabled is also true.
Fixes: 2e8cff0a0eee ("arm64: fix rodata=full")
Cc: <stable(a)vger.kernel.org> # 6.0.x
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Acked-by: Will Deacon <will(a)kernel.org>
Link: https://lore.kernel.org/r/20221103170015.4124426-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index d107c3d434e2..5922178d7a06 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -26,7 +26,7 @@ bool can_set_direct_map(void)
* mapped at page granularity, so that it is possible to
* protect/unprotect single pages.
*/
- return rodata_full || debug_pagealloc_enabled() ||
+ return (rodata_enabled && rodata_full) || debug_pagealloc_enabled() ||
IS_ENABLED(CONFIG_KFENCE);
}
@@ -102,7 +102,8 @@ static int change_memory_common(unsigned long addr, int numpages,
* If we are manipulating read-only permissions, apply the same
* change to the linear mapping of the pages that back this VM area.
*/
- if (rodata_full && (pgprot_val(set_mask) == PTE_RDONLY ||
+ if (rodata_enabled &&
+ rodata_full && (pgprot_val(set_mask) == PTE_RDONLY ||
pgprot_val(clear_mask) == PTE_RDONLY)) {
for (i = 0; i < area->nr_pages; i++) {
__change_memory_common((u64)page_address(area->pages[i]),
Expose the routine zap_page_range_single to zap a range within a single
vma. The madvise routine madvise_dontneed_single_vma can use this
routine as it explicitly operates on a single vma. Also, update the mmu
notification range in zap_page_range_single to take hugetlb pmd sharing
into account. This is required as MADV_DONTNEED supports hugetlb vmas.
Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Wei Chen <harperchen1110(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
---
include/linux/mm.h | 27 +++++++++++++++++++--------
mm/madvise.c | 6 +++---
mm/memory.c | 23 +++++++++++------------
3 files changed, 33 insertions(+), 23 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e8fc35edaee0..9e7cad65dfde 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1881,6 +1881,23 @@ static void __maybe_unused show_free_areas(unsigned int flags, nodemask_t *nodem
__show_free_areas(flags, nodemask, MAX_NR_ZONES - 1);
}
+/*
+ * Parameter block passed down to zap_pte_range in exceptional cases.
+ */
+struct zap_details {
+ struct folio *single_folio; /* Locked folio to be unmapped */
+ bool even_cows; /* Zap COWed private pages too? */
+ zap_flags_t zap_flags; /* Extra flags for zapping */
+};
+
+/*
+ * Whether to drop the pte markers, for example, the uffd-wp information for
+ * file-backed memory. This should only be specified when we will completely
+ * drop the page in the mm, either by truncation or unmapping of the vma. By
+ * default, the flag is not set.
+ */
+#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0))
+
#ifdef CONFIG_MMU
extern bool can_do_mlock(void);
#else
@@ -1898,6 +1915,8 @@ void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
unsigned long size);
void zap_page_range(struct vm_area_struct *vma, unsigned long address,
unsigned long size);
+void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
+ unsigned long size, struct zap_details *details);
void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
struct vm_area_struct *start_vma, unsigned long start,
unsigned long end);
@@ -3529,12 +3548,4 @@ madvise_set_anon_name(struct mm_struct *mm, unsigned long start,
}
#endif
-/*
- * Whether to drop the pte markers, for example, the uffd-wp information for
- * file-backed memory. This should only be specified when we will completely
- * drop the page in the mm, either by truncation or unmapping of the vma. By
- * default, the flag is not set.
- */
-#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0))
-
#endif /* _LINUX_MM_H */
diff --git a/mm/madvise.c b/mm/madvise.c
index 68a23104687f..b2f1860a353e 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -785,8 +785,8 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
* Application no longer needs these pages. If the pages are dirty,
* it's OK to just throw them away. The app will be more careful about
* data it wants to keep. Be sure to free swap resources too. The
- * zap_page_range call sets things up for shrink_active_list to actually free
- * these pages later if no one else has touched them in the meantime,
+ * zap_page_range_single call sets things up for shrink_active_list to actually
+ * free these pages later if no one else has touched them in the meantime,
* although we could add these pages to a global reuse list for
* shrink_active_list to pick up before reclaiming other pages.
*
@@ -803,7 +803,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
- zap_page_range(vma, start, end - start);
+ zap_page_range_single(vma, start, end - start, NULL);
return 0;
}
diff --git a/mm/memory.c b/mm/memory.c
index 98ddb91df9a7..ebdbd395cfad 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1294,15 +1294,6 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
return ret;
}
-/*
- * Parameter block passed down to zap_pte_range in exceptional cases.
- */
-struct zap_details {
- struct folio *single_folio; /* Locked folio to be unmapped */
- bool even_cows; /* Zap COWed private pages too? */
- zap_flags_t zap_flags; /* Extra flags for zapping */
-};
-
/* Whether we should zap all COWed (private) pages too */
static inline bool should_zap_cows(struct zap_details *details)
{
@@ -1736,19 +1727,27 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
*
* The range must fit into one VMA.
*/
-static void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
+void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
unsigned long size, struct zap_details *details)
{
+ unsigned long end = address + size;
struct mmu_notifier_range range;
struct mmu_gather tlb;
lru_add_drain();
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
- address, address + size);
+ address, end);
+ if (is_vm_hugetlb_page(vma))
+ adjust_range_if_pmd_sharing_possible(vma, &range.start,
+ &range.end);
tlb_gather_mmu(&tlb, vma->vm_mm);
update_hiwater_rss(vma->vm_mm);
mmu_notifier_invalidate_range_start(&range);
- unmap_single_vma(&tlb, vma, address, range.end, details);
+ /*
+ * unmap 'address-end' not 'range.start-range.end' as range
+ * could have been expanded for hugetlb pmd sharing.
+ */
+ unmap_single_vma(&tlb, vma, address, end, details);
mmu_notifier_invalidate_range_end(&range);
tlb_finish_mmu(&tlb);
}
--
2.37.3
Dear ,
Please can I have your attention and possibly help me for humanity's
sake please. I am writing this message with a heavy heart filled with
sorrows and sadness.
Please if you can respond, i have an issue that i will be most
grateful if you could help me deal with it please.
Susan
The PHY is powered on during phy-init by setting the SW_PWRDN bit in the
COM_POWER_DOWN_CTRL register and then setting the same bit in the in the
PCS_POWER_DOWN_CONTROL register that belongs to the USB part of the
PHY.
Currently, whether power on succeeds depends on probe order and having
the USB part of the PHY be initialised first. In case the DP part of the
PHY is instead initialised first, the intended power on of the USB block
results in a corrupted DP_PHY register (e.g. DP_PHY_AUX_CFG8).
Add a pointer to the USB part of the PHY to the driver data and use that
to power on the PHY also if the DP part of the PHY is initialised first.
Fixes: 52e013d0bffa ("phy: qcom-qmp: Add support for DP in USB3+DP combo phy")
Cc: stable(a)vger.kernel.org # 5.10
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/phy/qualcomm/phy-qcom-qmp-combo.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp-combo.c b/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
index 40c25a0ead23..17707f68d482 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
@@ -932,6 +932,7 @@ struct qcom_qmp {
struct regulator_bulk_data *vregs;
struct qmp_phy **phys;
+ struct qmp_phy *usb_phy;
struct mutex phy_mutex;
int init_count;
@@ -1911,7 +1912,7 @@ static int qmp_combo_com_init(struct qmp_phy *qphy)
{
struct qcom_qmp *qmp = qphy->qmp;
const struct qmp_phy_cfg *cfg = qphy->cfg;
- void __iomem *pcs = qphy->pcs;
+ struct qmp_phy *usb_phy = qmp->usb_phy;
void __iomem *dp_com = qmp->dp_com;
int ret;
@@ -1963,7 +1964,8 @@ static int qmp_combo_com_init(struct qmp_phy *qphy)
qphy_clrbits(dp_com, QPHY_V3_DP_COM_SWI_CTRL, 0x03);
qphy_clrbits(dp_com, QPHY_V3_DP_COM_SW_RESET, SW_RESET);
- qphy_setbits(pcs, cfg->regs[QPHY_PCS_POWER_DOWN_CONTROL], SW_PWRDN);
+ qphy_setbits(usb_phy->pcs, usb_phy->cfg->regs[QPHY_PCS_POWER_DOWN_CONTROL],
+ SW_PWRDN);
mutex_unlock(&qmp->phy_mutex);
@@ -2831,6 +2833,8 @@ static int qmp_combo_probe(struct platform_device *pdev)
goto err_node_put;
}
+ qmp->usb_phy = qmp->phys[id];
+
/*
* Register the pipe clock provided by phy.
* See function description to see details of this pipe clock.
@@ -2846,6 +2850,9 @@ static int qmp_combo_probe(struct platform_device *pdev)
id++;
}
+ if (!qmp->usb_phy)
+ return -EINVAL;
+
phy_provider = devm_of_phy_provider_register(dev, of_phy_simple_xlate);
return PTR_ERR_OR_ZERO(phy_provider);
--
2.37.4
The SC8180X has two resets but the DP configuration erroneously
described only one.
In case the DP part of the PHY is initialised before the USB part (e.g.
depending on probe order), then only the first reset would be asserted.
Fixes: 1633802cd4ac ("phy: qcom: qmp: Add SC8180x USB/DP combo")
Cc: stable(a)vger.kernel.org # 5.15
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/phy/qualcomm/phy-qcom-qmp-combo.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp-combo.c b/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
index cc53e2f99121..40c25a0ead23 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
@@ -1177,8 +1177,8 @@ static const struct qmp_phy_cfg sc8180x_dpphy_cfg = {
.clk_list = qmp_v3_phy_clk_l,
.num_clks = ARRAY_SIZE(qmp_v3_phy_clk_l),
- .reset_list = sc7180_usb3phy_reset_l,
- .num_resets = ARRAY_SIZE(sc7180_usb3phy_reset_l),
+ .reset_list = msm8996_usb3phy_reset_l,
+ .num_resets = ARRAY_SIZE(msm8996_usb3phy_reset_l),
.vreg_list = qmp_phy_vreg_l,
.num_vregs = ARRAY_SIZE(qmp_phy_vreg_l),
.regs = qmp_v3_usb3phy_regs_layout,
--
2.37.4
The SM8250 only uses three clocks but the DP configuration erroneously
described four clocks.
In case the DP part of the PHY is initialised before the USB part, this
would lead to uninitialised memory beyond the bulk-clocks array to be
treated as a clock pointer as the clocks are requested based on the USB
configuration.
Fixes: aff188feb5e1 ("phy: qcom-qmp: add support for sm8250-usb3-dp phy")
Cc: stable(a)vger.kernel.org # 5.13
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/phy/qualcomm/phy-qcom-qmp-combo.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp-combo.c b/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
index 5e11b6a1d189..bb38b18258ca 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
@@ -1270,8 +1270,8 @@ static const struct qmp_phy_cfg sm8250_dpphy_cfg = {
.swing_hbr3_hbr2 = &qmp_dp_v3_voltage_swing_hbr3_hbr2,
.pre_emphasis_hbr3_hbr2 = &qmp_dp_v3_pre_emphasis_hbr3_hbr2,
- .clk_list = qmp_v4_phy_clk_l,
- .num_clks = ARRAY_SIZE(qmp_v4_phy_clk_l),
+ .clk_list = qmp_v4_sm8250_usbphy_clk_l,
+ .num_clks = ARRAY_SIZE(qmp_v4_sm8250_usbphy_clk_l),
.reset_list = msm8996_usb3phy_reset_l,
.num_resets = ARRAY_SIZE(msm8996_usb3phy_reset_l),
.vreg_list = qmp_phy_vreg_l,
--
2.37.4
The PHY is powered on during phy-init by setting the SW_PRWDN bit in the
COM_POWER_DOWN_CTRL register and then setting the same bit in the in the
PCS_POWER_DOWN_CONTROL register that belongs to the USB part of the
PHY.
Currently, whether power on succeeds depends on probe order and having
the USB part of the PHY be initialised first. In case the DP part of the
PHY is instead initialised first, the intended power on of the USB block
results in a corrupted DP_PHY register (e.g. DP_PHY_AUX_CFG8).
Add a pointer to the USB part of the PHY to the driver data and use that
to power on the PHY also if the DP part of the PHY is initialised first.
Fixes: 52e013d0bffa ("phy: qcom-qmp: Add support for DP in USB3+DP combo phy")
Cc: stable(a)vger.kernel.org # 5.10
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/phy/qualcomm/phy-qcom-qmp-combo.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp-combo.c b/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
index 40c25a0ead23..17707f68d482 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
@@ -932,6 +932,7 @@ struct qcom_qmp {
struct regulator_bulk_data *vregs;
struct qmp_phy **phys;
+ struct qmp_phy *usb_phy;
struct mutex phy_mutex;
int init_count;
@@ -1911,7 +1912,7 @@ static int qmp_combo_com_init(struct qmp_phy *qphy)
{
struct qcom_qmp *qmp = qphy->qmp;
const struct qmp_phy_cfg *cfg = qphy->cfg;
- void __iomem *pcs = qphy->pcs;
+ struct qmp_phy *usb_phy = qmp->usb_phy;
void __iomem *dp_com = qmp->dp_com;
int ret;
@@ -1963,7 +1964,8 @@ static int qmp_combo_com_init(struct qmp_phy *qphy)
qphy_clrbits(dp_com, QPHY_V3_DP_COM_SWI_CTRL, 0x03);
qphy_clrbits(dp_com, QPHY_V3_DP_COM_SW_RESET, SW_RESET);
- qphy_setbits(pcs, cfg->regs[QPHY_PCS_POWER_DOWN_CONTROL], SW_PWRDN);
+ qphy_setbits(usb_phy->pcs, usb_phy->cfg->regs[QPHY_PCS_POWER_DOWN_CONTROL],
+ SW_PWRDN);
mutex_unlock(&qmp->phy_mutex);
@@ -2831,6 +2833,8 @@ static int qmp_combo_probe(struct platform_device *pdev)
goto err_node_put;
}
+ qmp->usb_phy = qmp->phys[id];
+
/*
* Register the pipe clock provided by phy.
* See function description to see details of this pipe clock.
@@ -2846,6 +2850,9 @@ static int qmp_combo_probe(struct platform_device *pdev)
id++;
}
+ if (!qmp->usb_phy)
+ return -EINVAL;
+
phy_provider = devm_of_phy_provider_register(dev, of_phy_simple_xlate);
return PTR_ERR_OR_ZERO(phy_provider);
--
2.37.4
Good day,
How are you doing today, I hope this email finds you in good health. You have not responded to my previous mail to you regarding Mr. Fredrick.
Kindly acknowledge my proposition and let me know what your decisions are, if you are taking the offer.
Get back to me as soon as you can for further details.
Regards,
Andrew Kyle