On hardware that supports Indirect Branch Tracking (IBT), Hyper-V VMs
with ConfigVersion 9.3 or later support IBT in the guest. However,
current versions of Hyper-V have a bug in that there's not an ENDBR64
instruction at the beginning of the hypercall page. Since hypercalls are
made with an indirect call to the hypercall page, all hypercall attempts
fail with an exception and Linux panics.
A Hyper-V fix is in progress to add ENDBR64. But guard against the Linux
panic by clearing X86_FEATURE_IBT if the hypercall page doesn't start
with ENDBR. The VM will boot and run without IBT.
If future Linux 32-bit kernels were to support IBT, additional hypercall
page hackery would be needed to make IBT work for such kernels in a
Hyper-V VM.
Cc: stable(a)vger.kernel.org
Signed-off-by: Michael Kelley <mikelley(a)microsoft.com>
---
Changes in v2:
* Use pr_warn() instead of pr_info() [Peter Zijlstra]
arch/x86/hyperv/hv_init.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 6c04b52..5cbee24 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -14,6 +14,7 @@
#include <asm/apic.h>
#include <asm/desc.h>
#include <asm/sev.h>
+#include <asm/ibt.h>
#include <asm/hypervisor.h>
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>
@@ -472,6 +473,26 @@ void __init hyperv_init(void)
}
/*
+ * Some versions of Hyper-V that provide IBT in guest VMs have a bug
+ * in that there's no ENDBR64 instruction at the entry to the
+ * hypercall page. Because hypercalls are invoked via an indirect call
+ * to the hypercall page, all hypercall attempts fail when IBT is
+ * enabled, and Linux panics. For such buggy versions, disable IBT.
+ *
+ * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
+ * page, so if future Linux kernel versions enable IBT for 32-bit
+ * builds, additional hypercall page hackery will be required here
+ * to provide an ENDBR32.
+ */
+#ifdef CONFIG_X86_KERNEL_IBT
+ if (cpu_feature_enabled(X86_FEATURE_IBT) &&
+ *(u32 *)hv_hypercall_pg != gen_endbr()) {
+ setup_clear_cpu_cap(X86_FEATURE_IBT);
+ pr_warn("Hyper-V: Disabling IBT because of Hyper-V bug\n");
+ }
+#endif
+
+ /*
* hyperv_init() is called before LAPIC is initialized: see
* apic_intr_mode_init() -> x86_platform.apic_post_init() and
* apic_bsp_setup() -> setup_local_APIC(). The direct-mode STIMER
--
1.8.3.1
Hi Andres,
With this commit applied to the 6.1 and later kernels (others not
tested) the iowait time ("wa" field in top) in an ARM64 build running
on a 4 core CPU (a Raspberry Pi 4 B) increases to 25%, as if one core
is permanently blocked on I/O. The change can be observed after
installing mariadb-server (no configuration or use is required). After
reverting just this commit, "wa" drops to zero again.
I can believe that this change hasn't negatively affected performance,
but the result is misleading. I also think it's pushing the boundaries
of what a back-port to stable should do.
Phil
From: Robin Murphy <robin.murphy(a)arm.com>
commit 309a15cb16bb075da1c99d46fb457db6a1a2669e upstream
To work around MMU-700 erratum 2812531 we need to ensure that certain
sequences of commands cannot be issued without an intervening sync. In
practice this falls out of our current command-batching machinery
anyway - each batch only contains a single type of invalidation command,
and ends with a sync. The only exception is when a batch is sufficiently
large to need issuing across multiple command queue slots, wherein the
earlier slots will not contain a sync and thus may in theory interleave
with another batch being issued in parallel to create an affected
sequence across the slot boundary.
Since MMU-700 supports range invalidate commands and thus we will prefer
to use them (which also happens to avoid conditions for other errata),
I'm not entirely sure it's even possible for a single high-level
invalidate call to generate a batch of more than 63 commands, but for
the sake of robustness and documentation, wire up an option to enforce
that a sync is always inserted for every slot issued.
The other aspect is that the relative order of DVM commands cannot be
controlled, so DVM cannot be used. Again that is already the status quo,
but since we have at least defined ARM_SMMU_FEAT_BTM, we can explicitly
disable it for documentation purposes even if it's not wired up anywhere
yet.
Signed-off-by: Robin Murphy <robin.murphy(a)arm.com>
Reviewed-by: Nicolin Chen <nicolinc(a)nvidia.com>
Link: https://lore.kernel.org/r/330221cdfd0003cd51b6c04e7ff3566741ad8374.16837312…
Signed-off-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Easwar Hariharan <eahariha(a)linux.microsoft.com>
---
Documentation/arm64/silicon-errata.rst | 4 +++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 39 +++++++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
3 files changed, 44 insertions(+)
diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index 55492fea4427..120784507bc0 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -138,6 +138,10 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | MMU-500 | #841119,826419 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | MMU-600 | #1076982 | N/A |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM | MMU-700 | #2812531 | N/A |
++----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 |
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index d4d8bfee9feb..3e20c60e9c32 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -882,6 +882,12 @@ static void arm_smmu_cmdq_batch_add(struct arm_smmu_device *smmu,
{
int index;
+ if (cmds->num == CMDQ_BATCH_ENTRIES - 1 &&
+ (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC)) {
+ arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, true);
+ cmds->num = 0;
+ }
+
if (cmds->num == CMDQ_BATCH_ENTRIES) {
arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, false);
cmds->num = 0;
@@ -3410,6 +3416,39 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
return 0;
}
+#define IIDR_IMPLEMENTER_ARM 0x43b
+#define IIDR_PRODUCTID_ARM_MMU_600 0x483
+#define IIDR_PRODUCTID_ARM_MMU_700 0x487
+
+static void arm_smmu_device_iidr_probe(struct arm_smmu_device *smmu)
+{
+ u32 reg;
+ unsigned int implementer, productid, variant, revision;
+
+ reg = readl_relaxed(smmu->base + ARM_SMMU_IIDR);
+ implementer = FIELD_GET(IIDR_IMPLEMENTER, reg);
+ productid = FIELD_GET(IIDR_PRODUCTID, reg);
+ variant = FIELD_GET(IIDR_VARIANT, reg);
+ revision = FIELD_GET(IIDR_REVISION, reg);
+
+ switch (implementer) {
+ case IIDR_IMPLEMENTER_ARM:
+ switch (productid) {
+ case IIDR_PRODUCTID_ARM_MMU_600:
+ /* Arm erratum 1076982 */
+ if (variant == 0 && revision <= 2)
+ smmu->features &= ~ARM_SMMU_FEAT_SEV;
+ break;
+ case IIDR_PRODUCTID_ARM_MMU_700:
+ /* Arm erratum 2812531 */
+ smmu->features &= ~ARM_SMMU_FEAT_BTM;
+ smmu->options |= ARM_SMMU_OPT_CMDQ_FORCE_SYNC;
+ break;
+ }
+ break;
+ }
+}
+
static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
{
u32 reg;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cd48590ada30..406ac0426762 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -644,6 +644,7 @@ struct arm_smmu_device {
#define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
#define ARM_SMMU_OPT_PAGE0_REGS_ONLY (1 << 1)
#define ARM_SMMU_OPT_MSIPOLL (1 << 2)
+#define ARM_SMMU_OPT_CMDQ_FORCE_SYNC (1 << 3)
u32 options;
struct arm_smmu_cmdq cmdq;
--
2.25.1
The patch titled
Subject: mm: lock VMA in dup_anon_vma() before setting ->anon_vma
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-lock-vma-in-dup_anon_vma-before-setting-anon_vma.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: mm: lock VMA in dup_anon_vma() before setting ->anon_vma
Date: Fri, 21 Jul 2023 05:46:43 +0200
When VMAs are merged, dup_anon_vma() is called with `dst` pointing to the
VMA that is being expanded to cover the area previously occupied by
another VMA. This currently happens while `dst` is not write-locked.
This means that, in the `src->anon_vma && !dst->anon_vma` case, as soon as
the assignment `dst->anon_vma = src->anon_vma` has happened, concurrent
page faults can happen on `dst` under the per-VMA lock. This is already
icky in itself, since such page faults can now install pages into `dst`
that are attached to an `anon_vma` that is not yet tied back to the
`anon_vma` with an `anon_vma_chain`. But if `anon_vma_clone()` fails due
to an out-of-memory error, things get much worse: `anon_vma_clone()` then
reverts `dst->anon_vma` back to NULL, and `dst` remains completely
unconnected to the `anon_vma`, even though we can have pages in the area
covered by `dst` that point to the `anon_vma`.
This means the `anon_vma` of such pages can be freed while the pages are
still mapped into userspace, which leads to UAF when a helper like
folio_lock_anon_vma_read() tries to look up the anon_vma of such a page.
This theoretically is a security bug, but I believe it is really hard to
actually trigger as an unprivileged user because it requires that you can
make an order-0 GFP_KERNEL allocation fail, and the page allocator tries
pretty hard to prevent that.
I think doing the vma_start_write() call inside dup_anon_vma() is the most
straightforward fix for now.
For a kernel-assisted reproducer, see the notes section of the patch mail.
Link: https://lkml.kernel.org/r/20230721034643.616851-1-jannh@google.com
Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/mmap.c~mm-lock-vma-in-dup_anon_vma-before-setting-anon_vma
+++ a/mm/mmap.c
@@ -615,6 +615,7 @@ static inline int dup_anon_vma(struct vm
* anon pages imported.
*/
if (src->anon_vma && !dst->anon_vma) {
+ vma_start_write(dst);
dst->anon_vma = src->anon_vma;
return anon_vma_clone(dst, src);
}
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq.patch
mm-lock-vma-in-dup_anon_vma-before-setting-anon_vma.patch
I'm announcing the release of the 4.19.289 kernel.
All AMD processor users of the 4.19 kernel series who have not updated
their microcode to the latest version, must upgrade.
The updated 4.19.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.19.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/x86/include/asm/microcode.h | 1
arch/x86/include/asm/microcode_amd.h | 2
arch/x86/include/asm/msr-index.h | 1
arch/x86/kernel/cpu/amd.c | 199 ++++++++++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 2
arch/x86/kernel/cpu/microcode/amd.c | 2
7 files changed, 135 insertions(+), 74 deletions(-)
Borislav Petkov (AMD) (3):
x86/microcode/AMD: Load late on both threads too
x86/cpu/amd: Move the errata checking functionality up
x86/cpu/amd: Add a Zenbleed fix
Greg Kroah-Hartman (1):
Linux 4.19.289
I'm announcing the release of the 5.4.250 kernel.
All AMD processor users of the 5.4 kernel series who have not updated
their microcode to the latest version, must upgrade.
The updated 5.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/x86/include/asm/microcode.h | 1
arch/x86/include/asm/microcode_amd.h | 2
arch/x86/include/asm/msr-index.h | 1
arch/x86/kernel/cpu/amd.c | 199 ++++++++++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 2
arch/x86/kernel/cpu/microcode/amd.c | 2
7 files changed, 135 insertions(+), 74 deletions(-)
Borislav Petkov (AMD) (3):
x86/microcode/AMD: Load late on both threads too
x86/cpu/amd: Move the errata checking functionality up
x86/cpu/amd: Add a Zenbleed fix
Greg Kroah-Hartman (1):
Linux 5.4.250
I'm announcing the release of the 5.10.187 kernel.
All AMD processor users of the 5.10 kernel series who have not updated
their microcode to the latest version, must upgrade.
The updated 5.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/x86/include/asm/microcode.h | 1
arch/x86/include/asm/microcode_amd.h | 2
arch/x86/include/asm/msr-index.h | 1
arch/x86/kernel/cpu/amd.c | 199 ++++++++++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 2
arch/x86/kernel/cpu/microcode/amd.c | 2
7 files changed, 135 insertions(+), 74 deletions(-)
Borislav Petkov (AMD) (3):
x86/microcode/AMD: Load late on both threads too
x86/cpu/amd: Move the errata checking functionality up
x86/cpu/amd: Add a Zenbleed fix
Greg Kroah-Hartman (1):
Linux 5.10.187
I'm announcing the release of the 5.15.122 kernel.
All AMD processor users of the 5.15 kernel series who have not updated
their microcode to the latest version, must upgrade.
The updated 5.15.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.15.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/x86/include/asm/microcode.h | 1
arch/x86/include/asm/microcode_amd.h | 2
arch/x86/include/asm/msr-index.h | 1
arch/x86/kernel/cpu/amd.c | 199 ++++++++++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 2
6 files changed, 134 insertions(+), 73 deletions(-)
Borislav Petkov (AMD) (2):
x86/cpu/amd: Move the errata checking functionality up
x86/cpu/amd: Add a Zenbleed fix
Greg Kroah-Hartman (1):
Linux 5.15.122
I'm announcing the release of the 6.1.41 kernel.
All AMD processor users of the 6.1 kernel series who have not updated
their microcode to the latest version, must upgrade.
The updated 6.1.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.1.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/x86/include/asm/microcode.h | 1
arch/x86/include/asm/microcode_amd.h | 2
arch/x86/include/asm/msr-index.h | 1
arch/x86/kernel/cpu/amd.c | 199 ++++++++++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 2
6 files changed, 134 insertions(+), 73 deletions(-)
Borislav Petkov (AMD) (2):
x86/cpu/amd: Move the errata checking functionality up
x86/cpu/amd: Add a Zenbleed fix
Greg Kroah-Hartman (1):
Linux 6.1.41
I'm announcing the release of the 6.4.6 kernel.
All AMD processor users of the 6.4 kernel series who have not updated
their microcode to the latest version, must upgrade.
The updated 6.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/x86/include/asm/microcode.h | 1
arch/x86/include/asm/microcode_amd.h | 2
arch/x86/include/asm/msr-index.h | 1
arch/x86/kernel/cpu/amd.c | 199 ++++++++++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 2
6 files changed, 134 insertions(+), 73 deletions(-)
Borislav Petkov (AMD) (2):
x86/cpu/amd: Move the errata checking functionality up
x86/cpu/amd: Add a Zenbleed fix
Greg Kroah-Hartman (1):
Linux 6.4.6
The patch titled
Subject: mm: Fix memory ordering for mm_lock_seq and vm_lock_seq
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: mm: Fix memory ordering for mm_lock_seq and vm_lock_seq
Date: Sat, 22 Jul 2023 00:51:07 +0200
mm->mm_lock_seq effectively functions as a read/write lock; therefore it
must be used with acquire/release semantics.
A specific example is the interaction between userfaultfd_register() and
lock_vma_under_rcu().
userfaultfd_register() does the following from the point where it changes
a VMA's flags to the point where concurrent readers are permitted again
(in a simple scenario where only a single private VMA is accessed and no
merging/splitting is involved):
userfaultfd_register
userfaultfd_set_vm_flags
vm_flags_reset
vma_start_write
down_write(&vma->vm_lock->lock)
vma->vm_lock_seq = mm_lock_seq [marks VMA as busy]
up_write(&vma->vm_lock->lock)
vm_flags_init
[sets VM_UFFD_* in __vm_flags]
vma->vm_userfaultfd_ctx.ctx = ctx
mmap_write_unlock
vma_end_write_all
WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA]
There are no memory barriers in between the __vm_flags update and the
mm->mm_lock_seq update that unlocks the VMA, so the unlock can be
reordered to above the `vm_flags_init()` call, which means from the
perspective of a concurrent reader, a VMA can be marked as a userfaultfd
VMA while it is not VMA-locked. That's bad, we definitely need a
store-release for the unlock operation.
The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly
fine because all accesses to vma->vm_lock_seq that matter are always
protected by the VMA lock. There is a racy read in vma_start_read()
though that can tolerate false-positives, so we should be using
WRITE_ONCE() to keep things tidy and data-race-free (including for KCSAN).
On the other side, lock_vma_under_rcu() works as follows in the relevant
region for locking and userfaultfd check:
lock_vma_under_rcu
vma_start_read
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout]
down_read_trylock(&vma->vm_lock->lock)
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check]
userfaultfd_armed
checks vma->vm_flags & __VM_UFFD_FLAGS
Here, the interesting aspect is how far down the mm->mm_lock_seq read can
be reordered - if this read is reordered down below the vma->vm_flags
access, this could cause lock_vma_under_rcu() to partly operate on
information that was read while the VMA was supposed to be locked. To
prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we
need to read it with a load-acquire.
Some of the comment wording is based on suggestions by Suren.
BACKPORT WARNING: One of the functions changed by this patch (which I've
written against Linus' tree) is vma_try_start_write(), but this function
no longer exists in mm/mm-everything. I don't know whether the merged
version of this patch will be ordered before or after the patch that
removes vma_try_start_write(). If you're backporting this patch to a tree
with vma_try_start_write(), make sure this patch changes that function.
Link: https://lkml.kernel.org/r/20230721225107.942336-1-jannh@google.com
Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 29 +++++++++++++++++++++++------
include/linux/mm_types.h | 28 ++++++++++++++++++++++++++++
include/linux/mmap_lock.h | 10 ++++++++--
3 files changed, 59 insertions(+), 8 deletions(-)
--- a/include/linux/mmap_lock.h~mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq
+++ a/include/linux/mmap_lock.h
@@ -76,8 +76,14 @@ static inline void mmap_assert_write_loc
static inline void vma_end_write_all(struct mm_struct *mm)
{
mmap_assert_write_locked(mm);
- /* No races during update due to exclusive mmap_lock being held */
- WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1);
+ /*
+ * Nobody can concurrently modify mm->mm_lock_seq due to exclusive
+ * mmap_lock being held.
+ * We need RELEASE semantics here to ensure that preceding stores into
+ * the VMA take effect before we unlock it with this store.
+ * Pairs with ACQUIRE semantics in vma_start_read().
+ */
+ smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1);
}
#else
static inline void vma_end_write_all(struct mm_struct *mm) {}
--- a/include/linux/mm.h~mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq
+++ a/include/linux/mm.h
@@ -641,8 +641,14 @@ static inline void vma_numab_state_free(
*/
static inline bool vma_start_read(struct vm_area_struct *vma)
{
- /* Check before locking. A race might cause false locked result. */
- if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
+ /*
+ * Check before locking. A race might cause false locked result.
+ * We can use READ_ONCE() for the mm_lock_seq here, and don't need
+ * ACQUIRE semantics, because this is just a lockless check whose result
+ * we don't rely on for anything - the mm_lock_seq read against which we
+ * need ordering is below.
+ */
+ if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(vma->vm_mm->mm_lock_seq))
return false;
if (unlikely(down_read_trylock(&vma->vm_lock->lock) == 0))
@@ -653,8 +659,13 @@ static inline bool vma_start_read(struct
* False unlocked result is impossible because we modify and check
* vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq
* modification invalidates all existing locks.
+ *
+ * We must use ACQUIRE semantics for the mm_lock_seq so that if we are
+ * racing with vma_end_write_all(), we only start reading from the VMA
+ * after it has been unlocked.
+ * This pairs with RELEASE semantics in vma_end_write_all().
*/
- if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) {
+ if (unlikely(vma->vm_lock_seq == smp_load_acquire(&vma->vm_mm->mm_lock_seq))) {
up_read(&vma->vm_lock->lock);
return false;
}
@@ -676,7 +687,7 @@ static bool __is_vma_write_locked(struct
* current task is holding mmap_write_lock, both vma->vm_lock_seq and
* mm->mm_lock_seq can't be concurrently modified.
*/
- *mm_lock_seq = READ_ONCE(vma->vm_mm->mm_lock_seq);
+ *mm_lock_seq = vma->vm_mm->mm_lock_seq;
return (vma->vm_lock_seq == *mm_lock_seq);
}
@@ -688,7 +699,13 @@ static inline void vma_start_write(struc
return;
down_write(&vma->vm_lock->lock);
- vma->vm_lock_seq = mm_lock_seq;
+ /*
+ * We should use WRITE_ONCE() here because we can have concurrent reads
+ * from the early lockless pessimistic check in vma_start_read().
+ * We don't really care about the correctness of that early check, but
+ * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy.
+ */
+ WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
up_write(&vma->vm_lock->lock);
}
@@ -702,7 +719,7 @@ static inline bool vma_try_start_write(s
if (!down_write_trylock(&vma->vm_lock->lock))
return false;
- vma->vm_lock_seq = mm_lock_seq;
+ WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
up_write(&vma->vm_lock->lock);
return true;
}
--- a/include/linux/mm_types.h~mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq
+++ a/include/linux/mm_types.h
@@ -514,6 +514,20 @@ struct vm_area_struct {
};
#ifdef CONFIG_PER_VMA_LOCK
+ /*
+ * Can only be written (using WRITE_ONCE()) while holding both:
+ * - mmap_lock (in write mode)
+ * - vm_lock->lock (in write mode)
+ * Can be read reliably while holding one of:
+ * - mmap_lock (in read or write mode)
+ * - vm_lock->lock (in read or write mode)
+ * Can be read unreliably (using READ_ONCE()) for pessimistic bailout
+ * while holding nothing (except RCU to keep the VMA struct allocated).
+ *
+ * This sequence counter is explicitly allowed to overflow; sequence
+ * counter reuse can only lead to occasional unnecessary use of the
+ * slowpath.
+ */
int vm_lock_seq;
struct vma_lock *vm_lock;
@@ -679,6 +693,20 @@ struct mm_struct {
* by mmlist_lock
*/
#ifdef CONFIG_PER_VMA_LOCK
+ /*
+ * This field has lock-like semantics, meaning it is sometimes
+ * accessed with ACQUIRE/RELEASE semantics.
+ * Roughly speaking, incrementing the sequence number is
+ * equivalent to releasing locks on VMAs; reading the sequence
+ * number can be part of taking a read lock on a VMA.
+ *
+ * Can be modified under write mmap_lock using RELEASE
+ * semantics.
+ * Can be read with no other protection when holding write
+ * mmap_lock.
+ * Can be read with ACQUIRE semantics if not holding write
+ * mmap_lock.
+ */
int mm_lock_seq;
#endif
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq.patch
The patch titled
Subject: Revert "um: Use swap() to make code cleaner"
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
revert-um-use-swap-to-make-code-cleaner.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Subject: Revert "um: Use swap() to make code cleaner"
Date: Mon, 24 Jul 2023 17:31:31 +0300
This reverts commit 9b0da3f22307af693be80f5d3a89dc4c7f360a85.
The sigio.c is clearly user space code which is handled by
arch/um/scripts/Makefile.rules (see USER_OBJS rule).
The above mentioned commit simply broke this agreement,
we may not use Linux kernel internal headers in them without
thorough thinking.
Hence, revert the wrong commit.
Link: https://lkml.kernel.org/r/20230724143131.30090-1-andriy.shevchenko@linux.in…
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307212304.cH79zJp1-lkp@intel.com/
Cc: Anton Ivanov <anton.ivanov(a)cambridgegreys.com>
Cc: Herve Codina <herve.codina(a)bootlin.com>
Cc: Jason A. Donenfeld <Jason(a)zx2c4.com>
Cc: Johannes Berg <johannes(a)sipsolutions.net>
Cc: Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
Cc: Richard Weinberger <richard(a)nod.at>
Cc: Yang Guang <yang.guang5(a)zte.com.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/um/os-Linux/sigio.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/arch/um/os-Linux/sigio.c~revert-um-use-swap-to-make-code-cleaner
+++ a/arch/um/os-Linux/sigio.c
@@ -3,7 +3,6 @@
* Copyright (C) 2002 - 2008 Jeff Dike (jdike(a){addtoit,linux.intel}.com)
*/
-#include <linux/minmax.h>
#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
@@ -51,7 +50,7 @@ static struct pollfds all_sigio_fds;
static int write_sigio_thread(void *unused)
{
- struct pollfds *fds;
+ struct pollfds *fds, tmp;
struct pollfd *p;
int i, n, respond_fd;
char c;
@@ -78,7 +77,9 @@ static int write_sigio_thread(void *unus
"write_sigio_thread : "
"read on socket failed, "
"err = %d\n", errno);
- swap(current_poll, next_poll);
+ tmp = current_poll;
+ current_poll = next_poll;
+ next_poll = tmp;
respond_fd = sigio_private[1];
}
else {
_
Patches currently in -mm which might be from andriy.shevchenko(a)linux.intel.com are
revert-um-use-swap-to-make-code-cleaner.patch
kernelh-split-out-count_args-and-concatenate-to-argsh.patch
x86-asm-replace-custom-count_args-concatenate-implementations.patch
arm64-smccc-replace-custom-count_args-concatenate-implementations.patch
genetlink-replace-custom-concatenate-implementation.patch
Hi,
as there are new hardware directives, we need a little adaptation
for the AUX invalidation sequence.
In this version we support all the engines affected by this
change.
The stable backport has some challenges because the original
patch that this series fixes has had more changes in between.
This patch is slowly exploding with code refactorings and
features added and fixed.
Thanks a lot Nirmoy, Andrzej and Matt for your review and for the
fruitful discussions!
Thanks,
Andi
Changelog:
=========
v7 -> v8
- Removed the aux invalidation from the device info and added a
helper function, instead (patch 2).
- Use "MTL and beyond" instead of "MTL+" in comments.
- Use the "gen12_" prefix instead of "intel_".
- In patch 6 return an int error instead of an error embedded in
the pointer in the intel_emit_pipe_control_cs() function and
propagate the error to the upper layers.
v6 -> v7
- Fix correct sequence applied to the correct engine. A little
confusion promptly cought by Nirmoy when applying to the VD
engine the sequence belonging to the compute engines. Thanks a
lot, Nirmoy!
v5 -> v6
- Fixed ccs flush in the engines VE and BCS. They are sent as a
separate command instead of added in the pipe control.
- Separated the CCS flusing in the pipe control patch with the
quiescing of the memory. They were meant to be on separate
patch already in the previous verision, but apparently I
squashed them by mistake.
v4 -> v5
- The AUX CCS is added as a device property instead of checking
against FLAT CCS. This adds the new HAS_AUX_CCS check
(Patch 2, new).
- little and trivial refactoring here and there.
- extended the flags{0,1}/bit_group_{0,1} renaming to other
functions.
- Created an intel_emit_pipe_control_cs() wrapper for submitting
the pipe control.
- Quiesce memory for all the engines, not just RCS (Patch 6,
new).
- The PIPE_CONTROL_CCS_FLUSH is added to all the engines.
- Remove redundant EMIT_FLUSH_CCS mode flag.
- Remove unnecessary NOOPs from the command streamer for
invalidating the CCS table.
- Use INVALID_MMIO_REG and gen12_get_aux_inv_reg() instad of
__MMIO(0) and reg.reg.
- Remove useless wrapper and just use gen12_get_aux_inv_reg().
v3 -> v4
- A trivial patch 3 is added to rename the flags with
bit_group_{0,1} to align with the datasheet naming.
- Patch 4 fixes a confusion I made where the CCS flag was
applied to the wrong bit group.
v2 -> v3
- added r-b from Nirmoy in patch 1 and 4.
- added patch 3 which enables the ccs_flush in the control pipe
for mtl+ compute and render engines.
- added redundant checks in patch 2 for enabling the EMIT_FLUSH
flag.
v1 -> v2
- add a clean up preliminary patch for the existing registers
- add support for more engines
- add the Fixes tag
Andi Shyti (7):
drm/i915/gt: Cleanup aux invalidation registers
drm/i915: Add the gen12_needs_ccs_aux_inv helper
drm/i915/gt: Rename flags with bit_group_X according to the datasheet
drm/i915/gt: Enable the CCS_FLUSH bit in the pipe control
drm/i915/gt: Refactor intel_emit_pipe_control_cs() in a single
function
drm/i915/gt: Ensure memory quiesced before invalidation for all
engines
drm/i915/gt: Support aux invalidation on all engines
Jonathan Cavitt (2):
drm/i915/gt: Ensure memory quiesced before invalidation
drm/i915/gt: Poll aux invalidation register bit on invalidation
drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 198 ++++++++++++-------
drivers/gpu/drm/i915/gt/gen8_engine_cs.h | 21 +-
drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 2 +
drivers/gpu/drm/i915/gt/intel_gt_regs.h | 16 +-
drivers/gpu/drm/i915/gt/intel_lrc.c | 17 +-
5 files changed, 155 insertions(+), 99 deletions(-)
--
2.40.1
Dpt objects that are created from internal get evicted when there is
memory pressure and do not get restored when pinned during scanout. The
pinned page table entries look corrupted and programming the display
engine with the incorrect pte's result in DE throwing pipe faults.
Create DPT objects from shmem and mark the object as dirty when pinning so
that the object is restored when shrinker evicts an unpinned buffer object.
v2: Unconditionally mark the dpt objects dirty during pinning(Chris).
Fixes: 0dc987b699ce ("drm/i915/display: Add smem fallback allocation for dpt")
Cc: <stable(a)vger.kernel.org> # v6.0+
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Suggested-by: Chris Wilson <chris.p.wilson(a)intel.com>
Signed-off-by: Fei Yang <fei.yang(a)intel.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada(a)intel.com>
---
drivers/gpu/drm/i915/display/intel_dpt.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c
index 7c5fddb203ba..fbfd8f959f17 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -166,6 +166,8 @@ struct i915_vma *intel_dpt_pin(struct i915_address_space *vm)
i915_vma_get(vma);
}
+ dpt->obj->mm.dirty = true;
+
atomic_dec(&i915->gpu_error.pending_fb_pin);
intel_runtime_pm_put(&i915->runtime_pm, wakeref);
@@ -261,7 +263,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
dpt_obj = i915_gem_object_create_stolen(i915, size);
if (IS_ERR(dpt_obj) && !HAS_LMEM(i915)) {
drm_dbg_kms(&i915->drm, "Allocating dpt from smem\n");
- dpt_obj = i915_gem_object_create_internal(i915, size);
+ dpt_obj = i915_gem_object_create_shmem(i915, size);
}
if (IS_ERR(dpt_obj))
return ERR_CAST(dpt_obj);
--
2.34.1
From: hackyzh002 <hackyzh002(a)gmail.com>
[ Upstream commit f828b681d0cd566f86351c0b913e6cb6ed8c7b9c ]
The type of size is unsigned, if size is 0x40000000, there will be an
integer overflow, size will be zero after size *= sizeof(uint32_t),
will cause uninitialized memory to be referenced later
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: hackyzh002 <hackyzh002(a)gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/radeon/radeon_cs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 1ae31dbc61c64..5e61abb3dce5c 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -265,7 +265,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
{
struct drm_radeon_cs *cs = data;
uint64_t *chunk_array_ptr;
- unsigned size, i;
+ u64 size;
+ unsigned i;
u32 ring = RADEON_CS_RING_GFX;
s32 priority = 0;
--
2.39.2
Hi Nirmoy,
> static int mtl_dummy_pipe_control(struct i915_request *rq)
> {
> /* Wa_14016712196 */
> if (IS_MTL_GRAPHICS_STEP(rq->engine->i915, M, STEP_A0, STEP_B0) ||
> - IS_MTL_GRAPHICS_STEP(rq->engine->i915, P, STEP_A0, STEP_B0)) {
> - u32 *cs;
> -
> - /* dummy PIPE_CONTROL + depth flush */
> - cs = intel_ring_begin(rq, 6);
> - if (IS_ERR(cs))
> - return PTR_ERR(cs);
> - cs = gen12_emit_pipe_control(cs,
> - 0,
> - PIPE_CONTROL_DEPTH_CACHE_FLUSH,
> - LRC_PPHWSP_SCRATCH_ADDR);
> - intel_ring_advance(rq, cs);
> - }
> + IS_MTL_GRAPHICS_STEP(rq->engine->i915, P, STEP_A0, STEP_B0))
> + return gen12_emit_pipe_control_cs(rq, 0,
> + PIPE_CONTROL_DEPTH_CACHE_FLUSH,
> + LRC_PPHWSP_SCRATCH_ADDR);
>
> Don't think this will get backported to 5.8+. I think mtl introduced after
> that.
>
> If that is not an issue for rest of the series and then
to be honest I don't think I fully understood the stable
policies, as in this series only two are the patches that are
really fixing things and the rest are only support.
In this case I think this will produce a conflict that will be
eventually fixed (... I guess!).
> Reviewed-by: Nirmoy Das <nirmoy.das(a)intel.com>
Thanks,
Andi
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 26efd79c4624294e553aeaa3439c646729bad084
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072114-giblet-unzip-f1db@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
26efd79c4624 ("ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()")
db42523b4f3e ("ftrace: Store the order of pages allocated in ftrace_page")
59300b36f85f ("ftrace: Check if pages were allocated before calling free_pages()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 26efd79c4624294e553aeaa3439c646729bad084 Mon Sep 17 00:00:00 2001
From: Zheng Yejian <zhengyejian1(a)huawei.com>
Date: Wed, 12 Jul 2023 14:04:52 +0800
Subject: [PATCH] ftrace: Fix possible warning on checking all pages used in
ftrace_process_locs()
As comments in ftrace_process_locs(), there may be NULL pointers in
mcount_loc section:
> Some architecture linkers will pad between
> the different mcount_loc sections of different
> object files to satisfy alignments.
> Skip any NULL pointers.
After commit 20e5227e9f55 ("ftrace: allow NULL pointers in mcount_loc"),
NULL pointers will be accounted when allocating ftrace pages but skipped
before adding into ftrace pages, this may result in some pages not being
used. Then after commit 706c81f87f84 ("ftrace: Remove extra helper
functions"), warning may occur at:
WARN_ON(pg->next);
To fix it, only warn for case that no pointers skipped but pages not used
up, then free those unused pages after releasing ftrace_lock.
Link: https://lore.kernel.org/linux-trace-kernel/20230712060452.3175675-1-zhengye…
Cc: stable(a)vger.kernel.org
Fixes: 706c81f87f84 ("ftrace: Remove extra helper functions")
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 3740aca79fe7..05c0024815bf 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3305,6 +3305,22 @@ static int ftrace_allocate_records(struct ftrace_page *pg, int count)
return cnt;
}
+static void ftrace_free_pages(struct ftrace_page *pages)
+{
+ struct ftrace_page *pg = pages;
+
+ while (pg) {
+ if (pg->records) {
+ free_pages((unsigned long)pg->records, pg->order);
+ ftrace_number_of_pages -= 1 << pg->order;
+ }
+ pages = pg->next;
+ kfree(pg);
+ pg = pages;
+ ftrace_number_of_groups--;
+ }
+}
+
static struct ftrace_page *
ftrace_allocate_pages(unsigned long num_to_init)
{
@@ -3343,17 +3359,7 @@ ftrace_allocate_pages(unsigned long num_to_init)
return start_pg;
free_pages:
- pg = start_pg;
- while (pg) {
- if (pg->records) {
- free_pages((unsigned long)pg->records, pg->order);
- ftrace_number_of_pages -= 1 << pg->order;
- }
- start_pg = pg->next;
- kfree(pg);
- pg = start_pg;
- ftrace_number_of_groups--;
- }
+ ftrace_free_pages(start_pg);
pr_info("ftrace: FAILED to allocate memory for functions\n");
return NULL;
}
@@ -6471,9 +6477,11 @@ static int ftrace_process_locs(struct module *mod,
unsigned long *start,
unsigned long *end)
{
+ struct ftrace_page *pg_unuse = NULL;
struct ftrace_page *start_pg;
struct ftrace_page *pg;
struct dyn_ftrace *rec;
+ unsigned long skipped = 0;
unsigned long count;
unsigned long *p;
unsigned long addr;
@@ -6536,8 +6544,10 @@ static int ftrace_process_locs(struct module *mod,
* object files to satisfy alignments.
* Skip any NULL pointers.
*/
- if (!addr)
+ if (!addr) {
+ skipped++;
continue;
+ }
end_offset = (pg->index+1) * sizeof(pg->records[0]);
if (end_offset > PAGE_SIZE << pg->order) {
@@ -6551,8 +6561,10 @@ static int ftrace_process_locs(struct module *mod,
rec->ip = addr;
}
- /* We should have used all pages */
- WARN_ON(pg->next);
+ if (pg->next) {
+ pg_unuse = pg->next;
+ pg->next = NULL;
+ }
/* Assign the last page to ftrace_pages */
ftrace_pages = pg;
@@ -6574,6 +6586,11 @@ static int ftrace_process_locs(struct module *mod,
out:
mutex_unlock(&ftrace_lock);
+ /* We should have used all pages unless we skipped some */
+ if (pg_unuse) {
+ WARN_ON(!skipped);
+ ftrace_free_pages(pg_unuse);
+ }
return ret;
}
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 4481913607e58196c48a4fef5e6f45350684ec3c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072146-sports-deluge-22a1@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4481913607e58196c48a4fef5e6f45350684ec3c Mon Sep 17 00:00:00 2001
From: Yunxiang Li <Yunxiang.Li(a)amd.com>
Date: Thu, 22 Jun 2023 10:18:03 -0400
Subject: [PATCH] drm/ttm: fix bulk_move corruption when adding a entry
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When the resource is the first in the bulk_move range, adding it again
(thus moving it to the tail) will corrupt the list since the first
pointer is not moved. This eventually lead to null pointer deref in
ttm_lru_bulk_move_del()
Fixes: fee2ede15542 ("drm/ttm: rework bulk move handling v5")
Signed-off-by: Yunxiang Li <Yunxiang.Li(a)amd.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
CC: stable(a)vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20230622141902.28718-3-Yunxia…
Signed-off-by: Christian König <christian.koenig(a)amd.com>
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index 7333f7a87a2f..e51dbc7a2d53 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -86,6 +86,8 @@ static void ttm_lru_bulk_move_pos_tail(struct ttm_lru_bulk_move_pos *pos,
struct ttm_resource *res)
{
if (pos->last != res) {
+ if (pos->first == res)
+ pos->first = list_next_entry(res, lru);
list_move(&res->lru, &pos->last->lru);
pos->last = res;
}
@@ -111,7 +113,8 @@ static void ttm_lru_bulk_move_del(struct ttm_lru_bulk_move *bulk,
{
struct ttm_lru_bulk_move_pos *pos = ttm_lru_bulk_move_pos(bulk, res);
- if (unlikely(pos->first == res && pos->last == res)) {
+ if (unlikely(WARN_ON(!pos->first || !pos->last) ||
+ pos->first == res && pos->last == res)) {
pos->first = NULL;
pos->last = NULL;
} else if (pos->first == res) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 46f881b5b1758dc4a35fba4a643c10717d0cf427
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072412-operable-craving-317a@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy")
b98dba273a0e ("jbd2: remove journal_clean_one_cp_list()")
dbf2bab7935b ("jbd2: simplify journal_clean_one_cp_list()")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
2bf31d94423c ("jbd2: fix kernel-doc markups")
60ed633f51d0 ("jbd2: fix incorrect code style")
597599268e3b ("jbd2: discard dirty data when forgetting an un-journalled buffer")
f69120ce6c02 ("jbd2: fix sphinx kernel-doc build warnings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46f881b5b1758dc4a35fba4a643c10717d0cf427 Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:27 +0800
Subject: [PATCH] jbd2: fix a race when checking checkpoint buffer busy
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 42b34cab64fb..9ec91017a7f3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -376,11 +376,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh))
- continue;
+ if (destroy) {
+ ret = __jbd2_journal_remove_checkpoint(jh);
+ } else {
+ ret = jbd2_journal_try_remove_checkpoint(jh);
+ if (ret < 0)
+ continue;
+ }
nr_freed++;
- ret = __jbd2_journal_remove_checkpoint(jh);
if (ret) {
*released = true;
break;
@@ -616,6 +620,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 1;
}
+/*
+ * Check the checkpoint buffer and try to remove it from the checkpoint
+ * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
+ * it frees the transaction, 0 otherwise.
+ *
+ * This function is called with j_list_lock held.
+ */
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!trylock_buffer(bh))
+ return -EBUSY;
+ if (buffer_dirty(bh)) {
+ unlock_buffer(bh);
+ return -EBUSY;
+ }
+ unlock_buffer(bh);
+
+ /*
+ * Buffer is clean and the IO has finished (we held the buffer
+ * lock) so the checkpoint is done. We can safely remove the
+ * buffer from this transaction.
+ */
+ JBUFFER_TRACE(jh, "remove from checkpoint list");
+ return __jbd2_journal_remove_checkpoint(jh);
+}
+
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..6ef5022949c4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
* Otherwise, if the buffer has been written to disk,
* it is safe to remove the checkpoint and drop it.
*/
- if (!buffer_dirty(bh)) {
- __jbd2_journal_remove_checkpoint(jh);
+ if (jbd2_journal_try_remove_checkpoint(jh) >= 0) {
spin_unlock(&journal->j_list_lock);
goto drop;
}
@@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh))
- goto out;
-
if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
- goto out;
+ return;
spin_lock(&journal->j_list_lock);
- if (jh->b_cp_transaction != NULL) {
- /* written-back checkpointed metadata buffer */
- JBUFFER_TRACE(jh, "remove from checkpoint list");
- __jbd2_journal_remove_checkpoint(jh);
- }
+ /* Remove written-back checkpointed metadata buffer */
+ if (jh->b_cp_transaction != NULL)
+ jbd2_journal_try_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
-out:
return;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index bd660aac8e07..44c298aa58d4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh);
void jbd2_journal_destroy_checkpoint(journal_t *journal);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 46f881b5b1758dc4a35fba4a643c10717d0cf427
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072403-virtuous-straw-2f43@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy")
b98dba273a0e ("jbd2: remove journal_clean_one_cp_list()")
dbf2bab7935b ("jbd2: simplify journal_clean_one_cp_list()")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
2bf31d94423c ("jbd2: fix kernel-doc markups")
60ed633f51d0 ("jbd2: fix incorrect code style")
597599268e3b ("jbd2: discard dirty data when forgetting an un-journalled buffer")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46f881b5b1758dc4a35fba4a643c10717d0cf427 Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:27 +0800
Subject: [PATCH] jbd2: fix a race when checking checkpoint buffer busy
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 42b34cab64fb..9ec91017a7f3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -376,11 +376,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh))
- continue;
+ if (destroy) {
+ ret = __jbd2_journal_remove_checkpoint(jh);
+ } else {
+ ret = jbd2_journal_try_remove_checkpoint(jh);
+ if (ret < 0)
+ continue;
+ }
nr_freed++;
- ret = __jbd2_journal_remove_checkpoint(jh);
if (ret) {
*released = true;
break;
@@ -616,6 +620,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 1;
}
+/*
+ * Check the checkpoint buffer and try to remove it from the checkpoint
+ * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
+ * it frees the transaction, 0 otherwise.
+ *
+ * This function is called with j_list_lock held.
+ */
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!trylock_buffer(bh))
+ return -EBUSY;
+ if (buffer_dirty(bh)) {
+ unlock_buffer(bh);
+ return -EBUSY;
+ }
+ unlock_buffer(bh);
+
+ /*
+ * Buffer is clean and the IO has finished (we held the buffer
+ * lock) so the checkpoint is done. We can safely remove the
+ * buffer from this transaction.
+ */
+ JBUFFER_TRACE(jh, "remove from checkpoint list");
+ return __jbd2_journal_remove_checkpoint(jh);
+}
+
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..6ef5022949c4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
* Otherwise, if the buffer has been written to disk,
* it is safe to remove the checkpoint and drop it.
*/
- if (!buffer_dirty(bh)) {
- __jbd2_journal_remove_checkpoint(jh);
+ if (jbd2_journal_try_remove_checkpoint(jh) >= 0) {
spin_unlock(&journal->j_list_lock);
goto drop;
}
@@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh))
- goto out;
-
if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
- goto out;
+ return;
spin_lock(&journal->j_list_lock);
- if (jh->b_cp_transaction != NULL) {
- /* written-back checkpointed metadata buffer */
- JBUFFER_TRACE(jh, "remove from checkpoint list");
- __jbd2_journal_remove_checkpoint(jh);
- }
+ /* Remove written-back checkpointed metadata buffer */
+ if (jh->b_cp_transaction != NULL)
+ jbd2_journal_try_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
-out:
return;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index bd660aac8e07..44c298aa58d4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh);
void jbd2_journal_destroy_checkpoint(journal_t *journal);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 46f881b5b1758dc4a35fba4a643c10717d0cf427
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072453-tricolor-employed-7622@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy")
b98dba273a0e ("jbd2: remove journal_clean_one_cp_list()")
dbf2bab7935b ("jbd2: simplify journal_clean_one_cp_list()")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
2bf31d94423c ("jbd2: fix kernel-doc markups")
60ed633f51d0 ("jbd2: fix incorrect code style")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46f881b5b1758dc4a35fba4a643c10717d0cf427 Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:27 +0800
Subject: [PATCH] jbd2: fix a race when checking checkpoint buffer busy
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 42b34cab64fb..9ec91017a7f3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -376,11 +376,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh))
- continue;
+ if (destroy) {
+ ret = __jbd2_journal_remove_checkpoint(jh);
+ } else {
+ ret = jbd2_journal_try_remove_checkpoint(jh);
+ if (ret < 0)
+ continue;
+ }
nr_freed++;
- ret = __jbd2_journal_remove_checkpoint(jh);
if (ret) {
*released = true;
break;
@@ -616,6 +620,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 1;
}
+/*
+ * Check the checkpoint buffer and try to remove it from the checkpoint
+ * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
+ * it frees the transaction, 0 otherwise.
+ *
+ * This function is called with j_list_lock held.
+ */
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!trylock_buffer(bh))
+ return -EBUSY;
+ if (buffer_dirty(bh)) {
+ unlock_buffer(bh);
+ return -EBUSY;
+ }
+ unlock_buffer(bh);
+
+ /*
+ * Buffer is clean and the IO has finished (we held the buffer
+ * lock) so the checkpoint is done. We can safely remove the
+ * buffer from this transaction.
+ */
+ JBUFFER_TRACE(jh, "remove from checkpoint list");
+ return __jbd2_journal_remove_checkpoint(jh);
+}
+
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..6ef5022949c4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
* Otherwise, if the buffer has been written to disk,
* it is safe to remove the checkpoint and drop it.
*/
- if (!buffer_dirty(bh)) {
- __jbd2_journal_remove_checkpoint(jh);
+ if (jbd2_journal_try_remove_checkpoint(jh) >= 0) {
spin_unlock(&journal->j_list_lock);
goto drop;
}
@@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh))
- goto out;
-
if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
- goto out;
+ return;
spin_lock(&journal->j_list_lock);
- if (jh->b_cp_transaction != NULL) {
- /* written-back checkpointed metadata buffer */
- JBUFFER_TRACE(jh, "remove from checkpoint list");
- __jbd2_journal_remove_checkpoint(jh);
- }
+ /* Remove written-back checkpointed metadata buffer */
+ if (jh->b_cp_transaction != NULL)
+ jbd2_journal_try_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
-out:
return;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index bd660aac8e07..44c298aa58d4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh);
void jbd2_journal_destroy_checkpoint(journal_t *journal);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 46f881b5b1758dc4a35fba4a643c10717d0cf427
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072443-tackiness-pennant-36bc@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy")
b98dba273a0e ("jbd2: remove journal_clean_one_cp_list()")
dbf2bab7935b ("jbd2: simplify journal_clean_one_cp_list()")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46f881b5b1758dc4a35fba4a643c10717d0cf427 Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:27 +0800
Subject: [PATCH] jbd2: fix a race when checking checkpoint buffer busy
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 42b34cab64fb..9ec91017a7f3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -376,11 +376,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh))
- continue;
+ if (destroy) {
+ ret = __jbd2_journal_remove_checkpoint(jh);
+ } else {
+ ret = jbd2_journal_try_remove_checkpoint(jh);
+ if (ret < 0)
+ continue;
+ }
nr_freed++;
- ret = __jbd2_journal_remove_checkpoint(jh);
if (ret) {
*released = true;
break;
@@ -616,6 +620,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 1;
}
+/*
+ * Check the checkpoint buffer and try to remove it from the checkpoint
+ * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
+ * it frees the transaction, 0 otherwise.
+ *
+ * This function is called with j_list_lock held.
+ */
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!trylock_buffer(bh))
+ return -EBUSY;
+ if (buffer_dirty(bh)) {
+ unlock_buffer(bh);
+ return -EBUSY;
+ }
+ unlock_buffer(bh);
+
+ /*
+ * Buffer is clean and the IO has finished (we held the buffer
+ * lock) so the checkpoint is done. We can safely remove the
+ * buffer from this transaction.
+ */
+ JBUFFER_TRACE(jh, "remove from checkpoint list");
+ return __jbd2_journal_remove_checkpoint(jh);
+}
+
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..6ef5022949c4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
* Otherwise, if the buffer has been written to disk,
* it is safe to remove the checkpoint and drop it.
*/
- if (!buffer_dirty(bh)) {
- __jbd2_journal_remove_checkpoint(jh);
+ if (jbd2_journal_try_remove_checkpoint(jh) >= 0) {
spin_unlock(&journal->j_list_lock);
goto drop;
}
@@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh))
- goto out;
-
if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
- goto out;
+ return;
spin_lock(&journal->j_list_lock);
- if (jh->b_cp_transaction != NULL) {
- /* written-back checkpointed metadata buffer */
- JBUFFER_TRACE(jh, "remove from checkpoint list");
- __jbd2_journal_remove_checkpoint(jh);
- }
+ /* Remove written-back checkpointed metadata buffer */
+ if (jh->b_cp_transaction != NULL)
+ jbd2_journal_try_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
-out:
return;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index bd660aac8e07..44c298aa58d4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh);
void jbd2_journal_destroy_checkpoint(journal_t *journal);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072441-deprive-glimpse-b891@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
5d5460fa7932 ("ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()")
f52f3d2b9fba ("ext4: Give symbolic names to mballoc criterias")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
fdd9a00943a5 ("ext4: Add per CR extent scanned counter")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Fri, 9 Jun 2023 16:04:03 +0530
Subject: [PATCH] ext4: fix off by one issue in
ext4_mb_choose_next_group_best_avail()
In ext4_mb_choose_next_group_best_avail(), we want the start order to be
1 less than goal length and the min_order to be, at max, 1 more than the
original length. This commit fixes an off by one issue that arose due to
the fact that 1 << fls(n) > (n).
After all the processing:
order = 1 order below goal len
min_order = maximum of the three:-
- order - trim_order
- 1 order below B2C(s_stripe)
- 1 order above original len
Cc: stable(a)kernel.org
Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a2475b8c9fb5..456150ef6111 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1006,14 +1006,11 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* fls() instead since we need to know the actual length while modifying
* goal length.
*/
- order = fls(ac->ac_g_ex.fe_len);
+ order = fls(ac->ac_g_ex.fe_len) - 1;
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
- if (1 << min_order < ac->ac_o_ex.fe_len)
- min_order = fls(ac->ac_o_ex.fe_len) + 1;
-
if (sbi->s_stripe > 0) {
/*
* We are assuming that stripe size is always a multiple of
@@ -1021,9 +1018,16 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
*/
num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe);
if (1 << min_order < num_stripe_clusters)
- min_order = fls(num_stripe_clusters);
+ /*
+ * We consider 1 order less because later we round
+ * up the goal len to num_stripe_clusters
+ */
+ min_order = fls(num_stripe_clusters) - 1;
}
+ if (1 << min_order < ac->ac_o_ex.fe_len)
+ min_order = fls(ac->ac_o_ex.fe_len);
+
for (i = order; i >= min_order; i--) {
int frag_order;
/*
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072434-stylist-reflex-c474@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
5d5460fa7932 ("ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()")
f52f3d2b9fba ("ext4: Give symbolic names to mballoc criterias")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
fdd9a00943a5 ("ext4: Add per CR extent scanned counter")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Fri, 9 Jun 2023 16:04:03 +0530
Subject: [PATCH] ext4: fix off by one issue in
ext4_mb_choose_next_group_best_avail()
In ext4_mb_choose_next_group_best_avail(), we want the start order to be
1 less than goal length and the min_order to be, at max, 1 more than the
original length. This commit fixes an off by one issue that arose due to
the fact that 1 << fls(n) > (n).
After all the processing:
order = 1 order below goal len
min_order = maximum of the three:-
- order - trim_order
- 1 order below B2C(s_stripe)
- 1 order above original len
Cc: stable(a)kernel.org
Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a2475b8c9fb5..456150ef6111 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1006,14 +1006,11 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* fls() instead since we need to know the actual length while modifying
* goal length.
*/
- order = fls(ac->ac_g_ex.fe_len);
+ order = fls(ac->ac_g_ex.fe_len) - 1;
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
- if (1 << min_order < ac->ac_o_ex.fe_len)
- min_order = fls(ac->ac_o_ex.fe_len) + 1;
-
if (sbi->s_stripe > 0) {
/*
* We are assuming that stripe size is always a multiple of
@@ -1021,9 +1018,16 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
*/
num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe);
if (1 << min_order < num_stripe_clusters)
- min_order = fls(num_stripe_clusters);
+ /*
+ * We consider 1 order less because later we round
+ * up the goal len to num_stripe_clusters
+ */
+ min_order = fls(num_stripe_clusters) - 1;
}
+ if (1 << min_order < ac->ac_o_ex.fe_len)
+ min_order = fls(ac->ac_o_ex.fe_len);
+
for (i = order; i >= min_order; i--) {
int frag_order;
/*
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 46f881b5b1758dc4a35fba4a643c10717d0cf427
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072433-obsessed-limeade-e4b9@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy")
b98dba273a0e ("jbd2: remove journal_clean_one_cp_list()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46f881b5b1758dc4a35fba4a643c10717d0cf427 Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:27 +0800
Subject: [PATCH] jbd2: fix a race when checking checkpoint buffer busy
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 42b34cab64fb..9ec91017a7f3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -376,11 +376,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh))
- continue;
+ if (destroy) {
+ ret = __jbd2_journal_remove_checkpoint(jh);
+ } else {
+ ret = jbd2_journal_try_remove_checkpoint(jh);
+ if (ret < 0)
+ continue;
+ }
nr_freed++;
- ret = __jbd2_journal_remove_checkpoint(jh);
if (ret) {
*released = true;
break;
@@ -616,6 +620,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 1;
}
+/*
+ * Check the checkpoint buffer and try to remove it from the checkpoint
+ * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
+ * it frees the transaction, 0 otherwise.
+ *
+ * This function is called with j_list_lock held.
+ */
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!trylock_buffer(bh))
+ return -EBUSY;
+ if (buffer_dirty(bh)) {
+ unlock_buffer(bh);
+ return -EBUSY;
+ }
+ unlock_buffer(bh);
+
+ /*
+ * Buffer is clean and the IO has finished (we held the buffer
+ * lock) so the checkpoint is done. We can safely remove the
+ * buffer from this transaction.
+ */
+ JBUFFER_TRACE(jh, "remove from checkpoint list");
+ return __jbd2_journal_remove_checkpoint(jh);
+}
+
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..6ef5022949c4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
* Otherwise, if the buffer has been written to disk,
* it is safe to remove the checkpoint and drop it.
*/
- if (!buffer_dirty(bh)) {
- __jbd2_journal_remove_checkpoint(jh);
+ if (jbd2_journal_try_remove_checkpoint(jh) >= 0) {
spin_unlock(&journal->j_list_lock);
goto drop;
}
@@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh))
- goto out;
-
if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
- goto out;
+ return;
spin_lock(&journal->j_list_lock);
- if (jh->b_cp_transaction != NULL) {
- /* written-back checkpointed metadata buffer */
- JBUFFER_TRACE(jh, "remove from checkpoint list");
- __jbd2_journal_remove_checkpoint(jh);
- }
+ /* Remove written-back checkpointed metadata buffer */
+ if (jh->b_cp_transaction != NULL)
+ jbd2_journal_try_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
-out:
return;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index bd660aac8e07..44c298aa58d4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh);
void jbd2_journal_destroy_checkpoint(journal_t *journal);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072427-curler-operate-a8d0@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
5d5460fa7932 ("ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()")
f52f3d2b9fba ("ext4: Give symbolic names to mballoc criterias")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
fdd9a00943a5 ("ext4: Add per CR extent scanned counter")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Fri, 9 Jun 2023 16:04:03 +0530
Subject: [PATCH] ext4: fix off by one issue in
ext4_mb_choose_next_group_best_avail()
In ext4_mb_choose_next_group_best_avail(), we want the start order to be
1 less than goal length and the min_order to be, at max, 1 more than the
original length. This commit fixes an off by one issue that arose due to
the fact that 1 << fls(n) > (n).
After all the processing:
order = 1 order below goal len
min_order = maximum of the three:-
- order - trim_order
- 1 order below B2C(s_stripe)
- 1 order above original len
Cc: stable(a)kernel.org
Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a2475b8c9fb5..456150ef6111 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1006,14 +1006,11 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* fls() instead since we need to know the actual length while modifying
* goal length.
*/
- order = fls(ac->ac_g_ex.fe_len);
+ order = fls(ac->ac_g_ex.fe_len) - 1;
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
- if (1 << min_order < ac->ac_o_ex.fe_len)
- min_order = fls(ac->ac_o_ex.fe_len) + 1;
-
if (sbi->s_stripe > 0) {
/*
* We are assuming that stripe size is always a multiple of
@@ -1021,9 +1018,16 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
*/
num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe);
if (1 << min_order < num_stripe_clusters)
- min_order = fls(num_stripe_clusters);
+ /*
+ * We consider 1 order less because later we round
+ * up the goal len to num_stripe_clusters
+ */
+ min_order = fls(num_stripe_clusters) - 1;
}
+ if (1 << min_order < ac->ac_o_ex.fe_len)
+ min_order = fls(ac->ac_o_ex.fe_len);
+
for (i = order; i >= min_order; i--) {
int frag_order;
/*
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 46f881b5b1758dc4a35fba4a643c10717d0cf427
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072424-dejected-raisin-2dd7@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy")
b98dba273a0e ("jbd2: remove journal_clean_one_cp_list()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46f881b5b1758dc4a35fba4a643c10717d0cf427 Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:27 +0800
Subject: [PATCH] jbd2: fix a race when checking checkpoint buffer busy
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 42b34cab64fb..9ec91017a7f3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -376,11 +376,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh))
- continue;
+ if (destroy) {
+ ret = __jbd2_journal_remove_checkpoint(jh);
+ } else {
+ ret = jbd2_journal_try_remove_checkpoint(jh);
+ if (ret < 0)
+ continue;
+ }
nr_freed++;
- ret = __jbd2_journal_remove_checkpoint(jh);
if (ret) {
*released = true;
break;
@@ -616,6 +620,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 1;
}
+/*
+ * Check the checkpoint buffer and try to remove it from the checkpoint
+ * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
+ * it frees the transaction, 0 otherwise.
+ *
+ * This function is called with j_list_lock held.
+ */
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!trylock_buffer(bh))
+ return -EBUSY;
+ if (buffer_dirty(bh)) {
+ unlock_buffer(bh);
+ return -EBUSY;
+ }
+ unlock_buffer(bh);
+
+ /*
+ * Buffer is clean and the IO has finished (we held the buffer
+ * lock) so the checkpoint is done. We can safely remove the
+ * buffer from this transaction.
+ */
+ JBUFFER_TRACE(jh, "remove from checkpoint list");
+ return __jbd2_journal_remove_checkpoint(jh);
+}
+
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..6ef5022949c4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
* Otherwise, if the buffer has been written to disk,
* it is safe to remove the checkpoint and drop it.
*/
- if (!buffer_dirty(bh)) {
- __jbd2_journal_remove_checkpoint(jh);
+ if (jbd2_journal_try_remove_checkpoint(jh) >= 0) {
spin_unlock(&journal->j_list_lock);
goto drop;
}
@@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh))
- goto out;
-
if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
- goto out;
+ return;
spin_lock(&journal->j_list_lock);
- if (jh->b_cp_transaction != NULL) {
- /* written-back checkpointed metadata buffer */
- JBUFFER_TRACE(jh, "remove from checkpoint list");
- __jbd2_journal_remove_checkpoint(jh);
- }
+ /* Remove written-back checkpointed metadata buffer */
+ if (jh->b_cp_transaction != NULL)
+ jbd2_journal_try_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
-out:
return;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index bd660aac8e07..44c298aa58d4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh);
void jbd2_journal_destroy_checkpoint(journal_t *journal);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072419-silt-kinswoman-c896@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
5d5460fa7932 ("ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()")
f52f3d2b9fba ("ext4: Give symbolic names to mballoc criterias")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
fdd9a00943a5 ("ext4: Add per CR extent scanned counter")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Fri, 9 Jun 2023 16:04:03 +0530
Subject: [PATCH] ext4: fix off by one issue in
ext4_mb_choose_next_group_best_avail()
In ext4_mb_choose_next_group_best_avail(), we want the start order to be
1 less than goal length and the min_order to be, at max, 1 more than the
original length. This commit fixes an off by one issue that arose due to
the fact that 1 << fls(n) > (n).
After all the processing:
order = 1 order below goal len
min_order = maximum of the three:-
- order - trim_order
- 1 order below B2C(s_stripe)
- 1 order above original len
Cc: stable(a)kernel.org
Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a2475b8c9fb5..456150ef6111 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1006,14 +1006,11 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* fls() instead since we need to know the actual length while modifying
* goal length.
*/
- order = fls(ac->ac_g_ex.fe_len);
+ order = fls(ac->ac_g_ex.fe_len) - 1;
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
- if (1 << min_order < ac->ac_o_ex.fe_len)
- min_order = fls(ac->ac_o_ex.fe_len) + 1;
-
if (sbi->s_stripe > 0) {
/*
* We are assuming that stripe size is always a multiple of
@@ -1021,9 +1018,16 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
*/
num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe);
if (1 << min_order < num_stripe_clusters)
- min_order = fls(num_stripe_clusters);
+ /*
+ * We consider 1 order less because later we round
+ * up the goal len to num_stripe_clusters
+ */
+ min_order = fls(num_stripe_clusters) - 1;
}
+ if (1 << min_order < ac->ac_o_ex.fe_len)
+ min_order = fls(ac->ac_o_ex.fe_len);
+
for (i = order; i >= min_order; i--) {
int frag_order;
/*
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 46f881b5b1758dc4a35fba4a643c10717d0cf427
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072415-dyslexic-spoilage-fb58@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46f881b5b1758dc4a35fba4a643c10717d0cf427 Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:27 +0800
Subject: [PATCH] jbd2: fix a race when checking checkpoint buffer busy
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 42b34cab64fb..9ec91017a7f3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -376,11 +376,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh))
- continue;
+ if (destroy) {
+ ret = __jbd2_journal_remove_checkpoint(jh);
+ } else {
+ ret = jbd2_journal_try_remove_checkpoint(jh);
+ if (ret < 0)
+ continue;
+ }
nr_freed++;
- ret = __jbd2_journal_remove_checkpoint(jh);
if (ret) {
*released = true;
break;
@@ -616,6 +620,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 1;
}
+/*
+ * Check the checkpoint buffer and try to remove it from the checkpoint
+ * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
+ * it frees the transaction, 0 otherwise.
+ *
+ * This function is called with j_list_lock held.
+ */
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!trylock_buffer(bh))
+ return -EBUSY;
+ if (buffer_dirty(bh)) {
+ unlock_buffer(bh);
+ return -EBUSY;
+ }
+ unlock_buffer(bh);
+
+ /*
+ * Buffer is clean and the IO has finished (we held the buffer
+ * lock) so the checkpoint is done. We can safely remove the
+ * buffer from this transaction.
+ */
+ JBUFFER_TRACE(jh, "remove from checkpoint list");
+ return __jbd2_journal_remove_checkpoint(jh);
+}
+
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..6ef5022949c4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
* Otherwise, if the buffer has been written to disk,
* it is safe to remove the checkpoint and drop it.
*/
- if (!buffer_dirty(bh)) {
- __jbd2_journal_remove_checkpoint(jh);
+ if (jbd2_journal_try_remove_checkpoint(jh) >= 0) {
spin_unlock(&journal->j_list_lock);
goto drop;
}
@@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh))
- goto out;
-
if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
- goto out;
+ return;
spin_lock(&journal->j_list_lock);
- if (jh->b_cp_transaction != NULL) {
- /* written-back checkpointed metadata buffer */
- JBUFFER_TRACE(jh, "remove from checkpoint list");
- __jbd2_journal_remove_checkpoint(jh);
- }
+ /* Remove written-back checkpointed metadata buffer */
+ if (jh->b_cp_transaction != NULL)
+ jbd2_journal_try_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
-out:
return;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index bd660aac8e07..44c298aa58d4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh);
void jbd2_journal_destroy_checkpoint(journal_t *journal);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072412-raft-circle-8700@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
5d5460fa7932 ("ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()")
f52f3d2b9fba ("ext4: Give symbolic names to mballoc criterias")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
fdd9a00943a5 ("ext4: Add per CR extent scanned counter")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Fri, 9 Jun 2023 16:04:03 +0530
Subject: [PATCH] ext4: fix off by one issue in
ext4_mb_choose_next_group_best_avail()
In ext4_mb_choose_next_group_best_avail(), we want the start order to be
1 less than goal length and the min_order to be, at max, 1 more than the
original length. This commit fixes an off by one issue that arose due to
the fact that 1 << fls(n) > (n).
After all the processing:
order = 1 order below goal len
min_order = maximum of the three:-
- order - trim_order
- 1 order below B2C(s_stripe)
- 1 order above original len
Cc: stable(a)kernel.org
Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a2475b8c9fb5..456150ef6111 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1006,14 +1006,11 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* fls() instead since we need to know the actual length while modifying
* goal length.
*/
- order = fls(ac->ac_g_ex.fe_len);
+ order = fls(ac->ac_g_ex.fe_len) - 1;
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
- if (1 << min_order < ac->ac_o_ex.fe_len)
- min_order = fls(ac->ac_o_ex.fe_len) + 1;
-
if (sbi->s_stripe > 0) {
/*
* We are assuming that stripe size is always a multiple of
@@ -1021,9 +1018,16 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
*/
num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe);
if (1 << min_order < num_stripe_clusters)
- min_order = fls(num_stripe_clusters);
+ /*
+ * We consider 1 order less because later we round
+ * up the goal len to num_stripe_clusters
+ */
+ min_order = fls(num_stripe_clusters) - 1;
}
+ if (1 << min_order < ac->ac_o_ex.fe_len)
+ min_order = fls(ac->ac_o_ex.fe_len);
+
for (i = order; i >= min_order; i--) {
int frag_order;
/*
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072404-hubcap-railroad-9c52@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
5d5460fa7932 ("ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()")
f52f3d2b9fba ("ext4: Give symbolic names to mballoc criterias")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
fdd9a00943a5 ("ext4: Add per CR extent scanned counter")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Fri, 9 Jun 2023 16:04:03 +0530
Subject: [PATCH] ext4: fix off by one issue in
ext4_mb_choose_next_group_best_avail()
In ext4_mb_choose_next_group_best_avail(), we want the start order to be
1 less than goal length and the min_order to be, at max, 1 more than the
original length. This commit fixes an off by one issue that arose due to
the fact that 1 << fls(n) > (n).
After all the processing:
order = 1 order below goal len
min_order = maximum of the three:-
- order - trim_order
- 1 order below B2C(s_stripe)
- 1 order above original len
Cc: stable(a)kernel.org
Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a2475b8c9fb5..456150ef6111 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1006,14 +1006,11 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* fls() instead since we need to know the actual length while modifying
* goal length.
*/
- order = fls(ac->ac_g_ex.fe_len);
+ order = fls(ac->ac_g_ex.fe_len) - 1;
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
- if (1 << min_order < ac->ac_o_ex.fe_len)
- min_order = fls(ac->ac_o_ex.fe_len) + 1;
-
if (sbi->s_stripe > 0) {
/*
* We are assuming that stripe size is always a multiple of
@@ -1021,9 +1018,16 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
*/
num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe);
if (1 << min_order < num_stripe_clusters)
- min_order = fls(num_stripe_clusters);
+ /*
+ * We consider 1 order less because later we round
+ * up the goal len to num_stripe_clusters
+ */
+ min_order = fls(num_stripe_clusters) - 1;
}
+ if (1 << min_order < ac->ac_o_ex.fe_len)
+ min_order = fls(ac->ac_o_ex.fe_len);
+
for (i = order; i >= min_order; i--) {
int frag_order;
/*
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x c2d6fd9d6f35079f1669f0100f05b46708c74b7f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072459-unmolded-serrated-1862@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
c2d6fd9d6f35 ("jbd2: recheck chechpointing non-dirty buffer")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
2bf31d94423c ("jbd2: fix kernel-doc markups")
60ed633f51d0 ("jbd2: fix incorrect code style")
ccd3c4373eac ("jbd2: fix use after free in jbd2_log_do_checkpoint()")
f69120ce6c02 ("jbd2: fix sphinx kernel-doc build warnings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c2d6fd9d6f35079f1669f0100f05b46708c74b7f Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:23 +0800
Subject: [PATCH] jbd2: recheck chechpointing non-dirty buffer
There is a long-standing metadata corruption issue that happens from
time to time, but it's very difficult to reproduce and analyse, benefit
from the JBD2_CYCLE_RECORD option, we found out that the problem is the
checkpointing process miss to write out some buffers which are raced by
another do_get_write_access(). Looks below for detail.
jbd2_log_do_checkpoint() //transaction X
//buffer A is dirty and not belones to any transaction
__buffer_relink_io() //move it to the IO list
__flush_batch()
write_dirty_buffer()
do_get_write_access()
clear_buffer_dirty
__jbd2_journal_file_buffer()
//add buffer A to a new transaction Y
lock_buffer(bh)
//doesn't write out
__jbd2_journal_remove_checkpoint()
//finish checkpoint except buffer A
//filesystem corrupt if the new transaction Y isn't fully write out.
Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint()
have already handles waiting for buffers under IO and re-added new
transaction to complete commit, and it also removing cleaned buffers,
this makes sure the list will eventually get empty. So it's fine to
leave buffers on the t_checkpoint_list while flushing out and completely
stop using the t_checkpoint_io_list.
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Tested-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 51bd38da21cd..25e3c20eb19f 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -57,28 +57,6 @@ static inline void __buffer_unlink(struct journal_head *jh)
}
}
-/*
- * Move a buffer from the checkpoint list to the checkpoint io list
- *
- * Called with j_list_lock held
- */
-static inline void __buffer_relink_io(struct journal_head *jh)
-{
- transaction_t *transaction = jh->b_cp_transaction;
-
- __buffer_unlink_first(jh);
-
- if (!transaction->t_checkpoint_io_list) {
- jh->b_cpnext = jh->b_cpprev = jh;
- } else {
- jh->b_cpnext = transaction->t_checkpoint_io_list;
- jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
- jh->b_cpprev->b_cpnext = jh;
- jh->b_cpnext->b_cpprev = jh;
- }
- transaction->t_checkpoint_io_list = jh;
-}
-
/*
* Check a checkpoint buffer could be release or not.
*
@@ -183,6 +161,7 @@ __flush_batch(journal_t *journal, int *batch_count)
struct buffer_head *bh = journal->j_chkpt_bhs[i];
BUFFER_TRACE(bh, "brelse");
__brelse(bh);
+ journal->j_chkpt_bhs[i] = NULL;
}
*batch_count = 0;
}
@@ -242,6 +221,11 @@ int jbd2_log_do_checkpoint(journal_t *journal)
jh = transaction->t_checkpoint_list;
bh = jh2bh(jh);
+ /*
+ * The buffer may be writing back, or flushing out in the
+ * last couple of cycles, or re-adding into a new transaction,
+ * need to check it again until it's unlocked.
+ */
if (buffer_locked(bh)) {
get_bh(bh);
spin_unlock(&journal->j_list_lock);
@@ -287,28 +271,32 @@ int jbd2_log_do_checkpoint(journal_t *journal)
}
if (!buffer_dirty(bh)) {
BUFFER_TRACE(bh, "remove from checkpoint");
- if (__jbd2_journal_remove_checkpoint(jh))
- /* The transaction was released; we're done */
+ /*
+ * If the transaction was released or the checkpoint
+ * list was empty, we're done.
+ */
+ if (__jbd2_journal_remove_checkpoint(jh) ||
+ !transaction->t_checkpoint_list)
goto out;
- continue;
+ } else {
+ /*
+ * We are about to write the buffer, it could be
+ * raced by some other transaction shrink or buffer
+ * re-log logic once we release the j_list_lock,
+ * leave it on the checkpoint list and check status
+ * again to make sure it's clean.
+ */
+ BUFFER_TRACE(bh, "queue");
+ get_bh(bh);
+ J_ASSERT_BH(bh, !buffer_jwrite(bh));
+ journal->j_chkpt_bhs[batch_count++] = bh;
+ transaction->t_chp_stats.cs_written++;
+ transaction->t_checkpoint_list = jh->b_cpnext;
}
- /*
- * Important: we are about to write the buffer, and
- * possibly block, while still holding the journal
- * lock. We cannot afford to let the transaction
- * logic start messing around with this buffer before
- * we write it to disk, as that would break
- * recoverability.
- */
- BUFFER_TRACE(bh, "queue");
- get_bh(bh);
- J_ASSERT_BH(bh, !buffer_jwrite(bh));
- journal->j_chkpt_bhs[batch_count++] = bh;
- __buffer_relink_io(jh);
- transaction->t_chp_stats.cs_written++;
+
if ((batch_count == JBD2_NR_BATCH) ||
- need_resched() ||
- spin_needbreak(&journal->j_list_lock))
+ need_resched() || spin_needbreak(&journal->j_list_lock) ||
+ jh2bh(transaction->t_checkpoint_list) == journal->j_chkpt_bhs[0])
goto unlock_and_flush;
}
@@ -322,38 +310,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
goto restart;
}
- /*
- * Now we issued all of the transaction's buffers, let's deal
- * with the buffers that are out for I/O.
- */
-restart2:
- /* Did somebody clean up the transaction in the meanwhile? */
- if (journal->j_checkpoint_transactions != transaction ||
- transaction->t_tid != this_tid)
- goto out;
-
- while (transaction->t_checkpoint_io_list) {
- jh = transaction->t_checkpoint_io_list;
- bh = jh2bh(jh);
- if (buffer_locked(bh)) {
- get_bh(bh);
- spin_unlock(&journal->j_list_lock);
- wait_on_buffer(bh);
- /* the journal_head may have gone by now */
- BUFFER_TRACE(bh, "brelse");
- __brelse(bh);
- spin_lock(&journal->j_list_lock);
- goto restart2;
- }
-
- /*
- * Now in whatever state the buffer currently is, we
- * know that it has been written out and so we can
- * drop it from the list
- */
- if (__jbd2_journal_remove_checkpoint(jh))
- break;
- }
out:
spin_unlock(&journal->j_list_lock);
result = jbd2_cleanup_journal_tail(journal);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x c2d6fd9d6f35079f1669f0100f05b46708c74b7f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072457-patio-doornail-5f44@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
c2d6fd9d6f35 ("jbd2: recheck chechpointing non-dirty buffer")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
2bf31d94423c ("jbd2: fix kernel-doc markups")
60ed633f51d0 ("jbd2: fix incorrect code style")
ccd3c4373eac ("jbd2: fix use after free in jbd2_log_do_checkpoint()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c2d6fd9d6f35079f1669f0100f05b46708c74b7f Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:23 +0800
Subject: [PATCH] jbd2: recheck chechpointing non-dirty buffer
There is a long-standing metadata corruption issue that happens from
time to time, but it's very difficult to reproduce and analyse, benefit
from the JBD2_CYCLE_RECORD option, we found out that the problem is the
checkpointing process miss to write out some buffers which are raced by
another do_get_write_access(). Looks below for detail.
jbd2_log_do_checkpoint() //transaction X
//buffer A is dirty and not belones to any transaction
__buffer_relink_io() //move it to the IO list
__flush_batch()
write_dirty_buffer()
do_get_write_access()
clear_buffer_dirty
__jbd2_journal_file_buffer()
//add buffer A to a new transaction Y
lock_buffer(bh)
//doesn't write out
__jbd2_journal_remove_checkpoint()
//finish checkpoint except buffer A
//filesystem corrupt if the new transaction Y isn't fully write out.
Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint()
have already handles waiting for buffers under IO and re-added new
transaction to complete commit, and it also removing cleaned buffers,
this makes sure the list will eventually get empty. So it's fine to
leave buffers on the t_checkpoint_list while flushing out and completely
stop using the t_checkpoint_io_list.
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Tested-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 51bd38da21cd..25e3c20eb19f 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -57,28 +57,6 @@ static inline void __buffer_unlink(struct journal_head *jh)
}
}
-/*
- * Move a buffer from the checkpoint list to the checkpoint io list
- *
- * Called with j_list_lock held
- */
-static inline void __buffer_relink_io(struct journal_head *jh)
-{
- transaction_t *transaction = jh->b_cp_transaction;
-
- __buffer_unlink_first(jh);
-
- if (!transaction->t_checkpoint_io_list) {
- jh->b_cpnext = jh->b_cpprev = jh;
- } else {
- jh->b_cpnext = transaction->t_checkpoint_io_list;
- jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
- jh->b_cpprev->b_cpnext = jh;
- jh->b_cpnext->b_cpprev = jh;
- }
- transaction->t_checkpoint_io_list = jh;
-}
-
/*
* Check a checkpoint buffer could be release or not.
*
@@ -183,6 +161,7 @@ __flush_batch(journal_t *journal, int *batch_count)
struct buffer_head *bh = journal->j_chkpt_bhs[i];
BUFFER_TRACE(bh, "brelse");
__brelse(bh);
+ journal->j_chkpt_bhs[i] = NULL;
}
*batch_count = 0;
}
@@ -242,6 +221,11 @@ int jbd2_log_do_checkpoint(journal_t *journal)
jh = transaction->t_checkpoint_list;
bh = jh2bh(jh);
+ /*
+ * The buffer may be writing back, or flushing out in the
+ * last couple of cycles, or re-adding into a new transaction,
+ * need to check it again until it's unlocked.
+ */
if (buffer_locked(bh)) {
get_bh(bh);
spin_unlock(&journal->j_list_lock);
@@ -287,28 +271,32 @@ int jbd2_log_do_checkpoint(journal_t *journal)
}
if (!buffer_dirty(bh)) {
BUFFER_TRACE(bh, "remove from checkpoint");
- if (__jbd2_journal_remove_checkpoint(jh))
- /* The transaction was released; we're done */
+ /*
+ * If the transaction was released or the checkpoint
+ * list was empty, we're done.
+ */
+ if (__jbd2_journal_remove_checkpoint(jh) ||
+ !transaction->t_checkpoint_list)
goto out;
- continue;
+ } else {
+ /*
+ * We are about to write the buffer, it could be
+ * raced by some other transaction shrink or buffer
+ * re-log logic once we release the j_list_lock,
+ * leave it on the checkpoint list and check status
+ * again to make sure it's clean.
+ */
+ BUFFER_TRACE(bh, "queue");
+ get_bh(bh);
+ J_ASSERT_BH(bh, !buffer_jwrite(bh));
+ journal->j_chkpt_bhs[batch_count++] = bh;
+ transaction->t_chp_stats.cs_written++;
+ transaction->t_checkpoint_list = jh->b_cpnext;
}
- /*
- * Important: we are about to write the buffer, and
- * possibly block, while still holding the journal
- * lock. We cannot afford to let the transaction
- * logic start messing around with this buffer before
- * we write it to disk, as that would break
- * recoverability.
- */
- BUFFER_TRACE(bh, "queue");
- get_bh(bh);
- J_ASSERT_BH(bh, !buffer_jwrite(bh));
- journal->j_chkpt_bhs[batch_count++] = bh;
- __buffer_relink_io(jh);
- transaction->t_chp_stats.cs_written++;
+
if ((batch_count == JBD2_NR_BATCH) ||
- need_resched() ||
- spin_needbreak(&journal->j_list_lock))
+ need_resched() || spin_needbreak(&journal->j_list_lock) ||
+ jh2bh(transaction->t_checkpoint_list) == journal->j_chkpt_bhs[0])
goto unlock_and_flush;
}
@@ -322,38 +310,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
goto restart;
}
- /*
- * Now we issued all of the transaction's buffers, let's deal
- * with the buffers that are out for I/O.
- */
-restart2:
- /* Did somebody clean up the transaction in the meanwhile? */
- if (journal->j_checkpoint_transactions != transaction ||
- transaction->t_tid != this_tid)
- goto out;
-
- while (transaction->t_checkpoint_io_list) {
- jh = transaction->t_checkpoint_io_list;
- bh = jh2bh(jh);
- if (buffer_locked(bh)) {
- get_bh(bh);
- spin_unlock(&journal->j_list_lock);
- wait_on_buffer(bh);
- /* the journal_head may have gone by now */
- BUFFER_TRACE(bh, "brelse");
- __brelse(bh);
- spin_lock(&journal->j_list_lock);
- goto restart2;
- }
-
- /*
- * Now in whatever state the buffer currently is, we
- * know that it has been written out and so we can
- * drop it from the list
- */
- if (__jbd2_journal_remove_checkpoint(jh))
- break;
- }
out:
spin_unlock(&journal->j_list_lock);
result = jbd2_cleanup_journal_tail(journal);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x c2d6fd9d6f35079f1669f0100f05b46708c74b7f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072456-thermal-relieving-1b68@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
c2d6fd9d6f35 ("jbd2: recheck chechpointing non-dirty buffer")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
2bf31d94423c ("jbd2: fix kernel-doc markups")
60ed633f51d0 ("jbd2: fix incorrect code style")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c2d6fd9d6f35079f1669f0100f05b46708c74b7f Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:23 +0800
Subject: [PATCH] jbd2: recheck chechpointing non-dirty buffer
There is a long-standing metadata corruption issue that happens from
time to time, but it's very difficult to reproduce and analyse, benefit
from the JBD2_CYCLE_RECORD option, we found out that the problem is the
checkpointing process miss to write out some buffers which are raced by
another do_get_write_access(). Looks below for detail.
jbd2_log_do_checkpoint() //transaction X
//buffer A is dirty and not belones to any transaction
__buffer_relink_io() //move it to the IO list
__flush_batch()
write_dirty_buffer()
do_get_write_access()
clear_buffer_dirty
__jbd2_journal_file_buffer()
//add buffer A to a new transaction Y
lock_buffer(bh)
//doesn't write out
__jbd2_journal_remove_checkpoint()
//finish checkpoint except buffer A
//filesystem corrupt if the new transaction Y isn't fully write out.
Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint()
have already handles waiting for buffers under IO and re-added new
transaction to complete commit, and it also removing cleaned buffers,
this makes sure the list will eventually get empty. So it's fine to
leave buffers on the t_checkpoint_list while flushing out and completely
stop using the t_checkpoint_io_list.
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Tested-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 51bd38da21cd..25e3c20eb19f 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -57,28 +57,6 @@ static inline void __buffer_unlink(struct journal_head *jh)
}
}
-/*
- * Move a buffer from the checkpoint list to the checkpoint io list
- *
- * Called with j_list_lock held
- */
-static inline void __buffer_relink_io(struct journal_head *jh)
-{
- transaction_t *transaction = jh->b_cp_transaction;
-
- __buffer_unlink_first(jh);
-
- if (!transaction->t_checkpoint_io_list) {
- jh->b_cpnext = jh->b_cpprev = jh;
- } else {
- jh->b_cpnext = transaction->t_checkpoint_io_list;
- jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
- jh->b_cpprev->b_cpnext = jh;
- jh->b_cpnext->b_cpprev = jh;
- }
- transaction->t_checkpoint_io_list = jh;
-}
-
/*
* Check a checkpoint buffer could be release or not.
*
@@ -183,6 +161,7 @@ __flush_batch(journal_t *journal, int *batch_count)
struct buffer_head *bh = journal->j_chkpt_bhs[i];
BUFFER_TRACE(bh, "brelse");
__brelse(bh);
+ journal->j_chkpt_bhs[i] = NULL;
}
*batch_count = 0;
}
@@ -242,6 +221,11 @@ int jbd2_log_do_checkpoint(journal_t *journal)
jh = transaction->t_checkpoint_list;
bh = jh2bh(jh);
+ /*
+ * The buffer may be writing back, or flushing out in the
+ * last couple of cycles, or re-adding into a new transaction,
+ * need to check it again until it's unlocked.
+ */
if (buffer_locked(bh)) {
get_bh(bh);
spin_unlock(&journal->j_list_lock);
@@ -287,28 +271,32 @@ int jbd2_log_do_checkpoint(journal_t *journal)
}
if (!buffer_dirty(bh)) {
BUFFER_TRACE(bh, "remove from checkpoint");
- if (__jbd2_journal_remove_checkpoint(jh))
- /* The transaction was released; we're done */
+ /*
+ * If the transaction was released or the checkpoint
+ * list was empty, we're done.
+ */
+ if (__jbd2_journal_remove_checkpoint(jh) ||
+ !transaction->t_checkpoint_list)
goto out;
- continue;
+ } else {
+ /*
+ * We are about to write the buffer, it could be
+ * raced by some other transaction shrink or buffer
+ * re-log logic once we release the j_list_lock,
+ * leave it on the checkpoint list and check status
+ * again to make sure it's clean.
+ */
+ BUFFER_TRACE(bh, "queue");
+ get_bh(bh);
+ J_ASSERT_BH(bh, !buffer_jwrite(bh));
+ journal->j_chkpt_bhs[batch_count++] = bh;
+ transaction->t_chp_stats.cs_written++;
+ transaction->t_checkpoint_list = jh->b_cpnext;
}
- /*
- * Important: we are about to write the buffer, and
- * possibly block, while still holding the journal
- * lock. We cannot afford to let the transaction
- * logic start messing around with this buffer before
- * we write it to disk, as that would break
- * recoverability.
- */
- BUFFER_TRACE(bh, "queue");
- get_bh(bh);
- J_ASSERT_BH(bh, !buffer_jwrite(bh));
- journal->j_chkpt_bhs[batch_count++] = bh;
- __buffer_relink_io(jh);
- transaction->t_chp_stats.cs_written++;
+
if ((batch_count == JBD2_NR_BATCH) ||
- need_resched() ||
- spin_needbreak(&journal->j_list_lock))
+ need_resched() || spin_needbreak(&journal->j_list_lock) ||
+ jh2bh(transaction->t_checkpoint_list) == journal->j_chkpt_bhs[0])
goto unlock_and_flush;
}
@@ -322,38 +310,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
goto restart;
}
- /*
- * Now we issued all of the transaction's buffers, let's deal
- * with the buffers that are out for I/O.
- */
-restart2:
- /* Did somebody clean up the transaction in the meanwhile? */
- if (journal->j_checkpoint_transactions != transaction ||
- transaction->t_tid != this_tid)
- goto out;
-
- while (transaction->t_checkpoint_io_list) {
- jh = transaction->t_checkpoint_io_list;
- bh = jh2bh(jh);
- if (buffer_locked(bh)) {
- get_bh(bh);
- spin_unlock(&journal->j_list_lock);
- wait_on_buffer(bh);
- /* the journal_head may have gone by now */
- BUFFER_TRACE(bh, "brelse");
- __brelse(bh);
- spin_lock(&journal->j_list_lock);
- goto restart2;
- }
-
- /*
- * Now in whatever state the buffer currently is, we
- * know that it has been written out and so we can
- * drop it from the list
- */
- if (__jbd2_journal_remove_checkpoint(jh))
- break;
- }
out:
spin_unlock(&journal->j_list_lock);
result = jbd2_cleanup_journal_tail(journal);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x c2d6fd9d6f35079f1669f0100f05b46708c74b7f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072450-contour-slick-c3db@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
c2d6fd9d6f35 ("jbd2: recheck chechpointing non-dirty buffer")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c2d6fd9d6f35079f1669f0100f05b46708c74b7f Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:23 +0800
Subject: [PATCH] jbd2: recheck chechpointing non-dirty buffer
There is a long-standing metadata corruption issue that happens from
time to time, but it's very difficult to reproduce and analyse, benefit
from the JBD2_CYCLE_RECORD option, we found out that the problem is the
checkpointing process miss to write out some buffers which are raced by
another do_get_write_access(). Looks below for detail.
jbd2_log_do_checkpoint() //transaction X
//buffer A is dirty and not belones to any transaction
__buffer_relink_io() //move it to the IO list
__flush_batch()
write_dirty_buffer()
do_get_write_access()
clear_buffer_dirty
__jbd2_journal_file_buffer()
//add buffer A to a new transaction Y
lock_buffer(bh)
//doesn't write out
__jbd2_journal_remove_checkpoint()
//finish checkpoint except buffer A
//filesystem corrupt if the new transaction Y isn't fully write out.
Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint()
have already handles waiting for buffers under IO and re-added new
transaction to complete commit, and it also removing cleaned buffers,
this makes sure the list will eventually get empty. So it's fine to
leave buffers on the t_checkpoint_list while flushing out and completely
stop using the t_checkpoint_io_list.
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Tested-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 51bd38da21cd..25e3c20eb19f 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -57,28 +57,6 @@ static inline void __buffer_unlink(struct journal_head *jh)
}
}
-/*
- * Move a buffer from the checkpoint list to the checkpoint io list
- *
- * Called with j_list_lock held
- */
-static inline void __buffer_relink_io(struct journal_head *jh)
-{
- transaction_t *transaction = jh->b_cp_transaction;
-
- __buffer_unlink_first(jh);
-
- if (!transaction->t_checkpoint_io_list) {
- jh->b_cpnext = jh->b_cpprev = jh;
- } else {
- jh->b_cpnext = transaction->t_checkpoint_io_list;
- jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
- jh->b_cpprev->b_cpnext = jh;
- jh->b_cpnext->b_cpprev = jh;
- }
- transaction->t_checkpoint_io_list = jh;
-}
-
/*
* Check a checkpoint buffer could be release or not.
*
@@ -183,6 +161,7 @@ __flush_batch(journal_t *journal, int *batch_count)
struct buffer_head *bh = journal->j_chkpt_bhs[i];
BUFFER_TRACE(bh, "brelse");
__brelse(bh);
+ journal->j_chkpt_bhs[i] = NULL;
}
*batch_count = 0;
}
@@ -242,6 +221,11 @@ int jbd2_log_do_checkpoint(journal_t *journal)
jh = transaction->t_checkpoint_list;
bh = jh2bh(jh);
+ /*
+ * The buffer may be writing back, or flushing out in the
+ * last couple of cycles, or re-adding into a new transaction,
+ * need to check it again until it's unlocked.
+ */
if (buffer_locked(bh)) {
get_bh(bh);
spin_unlock(&journal->j_list_lock);
@@ -287,28 +271,32 @@ int jbd2_log_do_checkpoint(journal_t *journal)
}
if (!buffer_dirty(bh)) {
BUFFER_TRACE(bh, "remove from checkpoint");
- if (__jbd2_journal_remove_checkpoint(jh))
- /* The transaction was released; we're done */
+ /*
+ * If the transaction was released or the checkpoint
+ * list was empty, we're done.
+ */
+ if (__jbd2_journal_remove_checkpoint(jh) ||
+ !transaction->t_checkpoint_list)
goto out;
- continue;
+ } else {
+ /*
+ * We are about to write the buffer, it could be
+ * raced by some other transaction shrink or buffer
+ * re-log logic once we release the j_list_lock,
+ * leave it on the checkpoint list and check status
+ * again to make sure it's clean.
+ */
+ BUFFER_TRACE(bh, "queue");
+ get_bh(bh);
+ J_ASSERT_BH(bh, !buffer_jwrite(bh));
+ journal->j_chkpt_bhs[batch_count++] = bh;
+ transaction->t_chp_stats.cs_written++;
+ transaction->t_checkpoint_list = jh->b_cpnext;
}
- /*
- * Important: we are about to write the buffer, and
- * possibly block, while still holding the journal
- * lock. We cannot afford to let the transaction
- * logic start messing around with this buffer before
- * we write it to disk, as that would break
- * recoverability.
- */
- BUFFER_TRACE(bh, "queue");
- get_bh(bh);
- J_ASSERT_BH(bh, !buffer_jwrite(bh));
- journal->j_chkpt_bhs[batch_count++] = bh;
- __buffer_relink_io(jh);
- transaction->t_chp_stats.cs_written++;
+
if ((batch_count == JBD2_NR_BATCH) ||
- need_resched() ||
- spin_needbreak(&journal->j_list_lock))
+ need_resched() || spin_needbreak(&journal->j_list_lock) ||
+ jh2bh(transaction->t_checkpoint_list) == journal->j_chkpt_bhs[0])
goto unlock_and_flush;
}
@@ -322,38 +310,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
goto restart;
}
- /*
- * Now we issued all of the transaction's buffers, let's deal
- * with the buffers that are out for I/O.
- */
-restart2:
- /* Did somebody clean up the transaction in the meanwhile? */
- if (journal->j_checkpoint_transactions != transaction ||
- transaction->t_tid != this_tid)
- goto out;
-
- while (transaction->t_checkpoint_io_list) {
- jh = transaction->t_checkpoint_io_list;
- bh = jh2bh(jh);
- if (buffer_locked(bh)) {
- get_bh(bh);
- spin_unlock(&journal->j_list_lock);
- wait_on_buffer(bh);
- /* the journal_head may have gone by now */
- BUFFER_TRACE(bh, "brelse");
- __brelse(bh);
- spin_lock(&journal->j_list_lock);
- goto restart2;
- }
-
- /*
- * Now in whatever state the buffer currently is, we
- * know that it has been written out and so we can
- * drop it from the list
- */
- if (__jbd2_journal_remove_checkpoint(jh))
- break;
- }
out:
spin_unlock(&journal->j_list_lock);
result = jbd2_cleanup_journal_tail(journal);
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
[ Upstream commit 6bd6cd29c92401a101993290051fa55078238a52 ]
Returning early from stm32_usart_serial_remove() results in a resource
leak as several cleanup functions are not called. The driver core ignores
the return value and there is no possibility to clean up later.
uart_remove_one_port() only returns non-zero if there is some
inconsistency (i.e. stm32_usart_driver.state[port->line].uart_port == NULL).
This should never happen, and even if it does it's a bad idea to exit
early in the remove callback without cleaning up.
This prepares changing the prototype of struct platform_driver::remove to
return void. See commit 5c5a7680e67b ("platform: Provide a remove callback
that returns no value") for further details about this quest.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Link: https://lore.kernel.org/r/20230512173810.131447-2-u.kleine-koenig@pengutron…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/tty/serial/stm32-usart.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index 1e38fc9b10c11..e9e11a2596211 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -1755,13 +1755,10 @@ static int stm32_usart_serial_remove(struct platform_device *pdev)
struct uart_port *port = platform_get_drvdata(pdev);
struct stm32_port *stm32_port = to_stm32_port(port);
const struct stm32_usart_offsets *ofs = &stm32_port->info->ofs;
- int err;
u32 cr3;
pm_runtime_get_sync(&pdev->dev);
- err = uart_remove_one_port(&stm32_usart_driver, port);
- if (err)
- return(err);
+ uart_remove_one_port(&stm32_usart_driver, port);
pm_runtime_disable(&pdev->dev);
pm_runtime_set_suspended(&pdev->dev);
--
2.39.2
I hope this message finds you well. I am currently downsizing my Late Husband Steinway Piano and have come to the decision of finding a new home for my late husband's piano. It holds sentimental value to me, and I want to ensure it goes to someone who will appreciate and cherish it as much as he did.
If you or anyone you know is interested in acquiring a Steinway piano, I would be delighted to hear from you. I believe that finding a suitable owner who will give it the care and attention it deserves is of utmost importance. The piano has been well-maintained over the years and is in excellent condition.
Please feel free to reach out to me at your earliest convenience. I send you more pictures of the pictures in my next email.
Cheers,
Linda Ramey
Email: rsharon1759(a)gmail.com
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
[ Upstream commit 6bd6cd29c92401a101993290051fa55078238a52 ]
Returning early from stm32_usart_serial_remove() results in a resource
leak as several cleanup functions are not called. The driver core ignores
the return value and there is no possibility to clean up later.
uart_remove_one_port() only returns non-zero if there is some
inconsistency (i.e. stm32_usart_driver.state[port->line].uart_port == NULL).
This should never happen, and even if it does it's a bad idea to exit
early in the remove callback without cleaning up.
This prepares changing the prototype of struct platform_driver::remove to
return void. See commit 5c5a7680e67b ("platform: Provide a remove callback
that returns no value") for further details about this quest.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Link: https://lore.kernel.org/r/20230512173810.131447-2-u.kleine-koenig@pengutron…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/tty/serial/stm32-usart.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index 28edbaf7bb329..2a9c4058824a8 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -1753,13 +1753,10 @@ static int stm32_usart_serial_remove(struct platform_device *pdev)
struct uart_port *port = platform_get_drvdata(pdev);
struct stm32_port *stm32_port = to_stm32_port(port);
const struct stm32_usart_offsets *ofs = &stm32_port->info->ofs;
- int err;
u32 cr3;
pm_runtime_get_sync(&pdev->dev);
- err = uart_remove_one_port(&stm32_usart_driver, port);
- if (err)
- return(err);
+ uart_remove_one_port(&stm32_usart_driver, port);
pm_runtime_disable(&pdev->dev);
pm_runtime_set_suspended(&pdev->dev);
--
2.39.2
From: hackyzh002 <hackyzh002(a)gmail.com>
[ Upstream commit f828b681d0cd566f86351c0b913e6cb6ed8c7b9c ]
The type of size is unsigned, if size is 0x40000000, there will be an
integer overflow, size will be zero after size *= sizeof(uint32_t),
will cause uninitialized memory to be referenced later
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: hackyzh002 <hackyzh002(a)gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/radeon/radeon_cs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 1ae31dbc61c64..5e61abb3dce5c 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -265,7 +265,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
{
struct drm_radeon_cs *cs = data;
uint64_t *chunk_array_ptr;
- unsigned size, i;
+ u64 size;
+ unsigned i;
u32 ring = RADEON_CS_RING_GFX;
s32 priority = 0;
--
2.39.2
From: hackyzh002 <hackyzh002(a)gmail.com>
[ Upstream commit f828b681d0cd566f86351c0b913e6cb6ed8c7b9c ]
The type of size is unsigned, if size is 0x40000000, there will be an
integer overflow, size will be zero after size *= sizeof(uint32_t),
will cause uninitialized memory to be referenced later
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: hackyzh002 <hackyzh002(a)gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/radeon/radeon_cs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 7b54606783821..ba64dad1d7c9e 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -271,7 +271,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
{
struct drm_radeon_cs *cs = data;
uint64_t *chunk_array_ptr;
- unsigned size, i;
+ u64 size;
+ unsigned i;
u32 ring = RADEON_CS_RING_GFX;
s32 priority = 0;
--
2.39.2
From: hackyzh002 <hackyzh002(a)gmail.com>
[ Upstream commit f828b681d0cd566f86351c0b913e6cb6ed8c7b9c ]
The type of size is unsigned, if size is 0x40000000, there will be an
integer overflow, size will be zero after size *= sizeof(uint32_t),
will cause uninitialized memory to be referenced later
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: hackyzh002 <hackyzh002(a)gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/radeon/radeon_cs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 9ed2b2700e0a5..e5fbe851ed930 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -270,7 +270,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
{
struct drm_radeon_cs *cs = data;
uint64_t *chunk_array_ptr;
- unsigned size, i;
+ u64 size;
+ unsigned i;
u32 ring = RADEON_CS_RING_GFX;
s32 priority = 0;
--
2.39.2
The Hyper-V host is queried to get the max transfer size that it
supports, and this value is used to set max_sectors for the synthetic
SCSI controller. However, this max transfer size may be too large
for virtual Fibre Channel devices, which are limited to 512 Kbytes.
If a larger transfer size is used with a vFC device, Hyper-V always
returns an error, and storvsc logs a message like this where the SRB
status and SCSI status are both zero:
hv_storvsc <GUID>: tag#197 cmd 0x8a status: scsi 0x0 srb 0x0 hv 0xc0000001
Add logic to limit the max transfer size to 512 Kbytes for vFC devices.
Fixes: 1d3e0980782f ("scsi: storvsc: Correct reporting of Hyper-V I/O size limits")
Cc: stable(a)vger.kernel.org
Signed-off-by: Michael Kelley <mikelley(a)microsoft.com>
---
drivers/scsi/storvsc_drv.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 7f12d93..f282321 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -366,6 +366,7 @@ enum storvsc_request_type {
#define STORVSC_FC_MAX_LUNS_PER_TARGET 255
#define STORVSC_FC_MAX_TARGETS 128
#define STORVSC_FC_MAX_CHANNELS 8
+#define STORVSC_FC_MAX_XFER_SIZE ((u32)(512 * 1024))
#define STORVSC_IDE_MAX_LUNS_PER_TARGET 64
#define STORVSC_IDE_MAX_TARGETS 1
@@ -2006,6 +2007,9 @@ static int storvsc_probe(struct hv_device *device,
* protecting it from any weird value.
*/
max_xfer_bytes = round_down(stor_device->max_transfer_bytes, HV_HYP_PAGE_SIZE);
+ if (is_fc)
+ max_xfer_bytes = min(max_xfer_bytes, STORVSC_FC_MAX_XFER_SIZE);
+
/* max_hw_sectors_kb */
host->max_sectors = max_xfer_bytes >> 9;
/*
--
1.8.3.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x b321c31c9b7b309dcde5e8854b741c8e6a9a05f0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072322-posing-dispatch-863e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
b321c31c9b7b ("KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemption")
0c2f9acf6ae7 ("KVM: arm64: PMU: Don't overwrite PMUSERENR with vcpu loaded")
8681f7175901 ("KVM: arm64: PMU: Restore the host's PMUSERENR_EL0")
009d6dc87a56 ("ARM: perf: Allow the use of the PMUv3 driver on 32bit ARM")
711432770f78 ("perf: pmuv3: Abstract PMU version checks")
df29ddf4f04b ("arm64: perf: Abstract system register accesses away")
7755cec63ade ("arm64: perf: Move PMUv3 driver to drivers/perf")
cc91b9481605 ("arm64/perf: Replace PMU version number '0' with ID_AA64DFR0_EL1_PMUVer_NI")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b321c31c9b7b309dcde5e8854b741c8e6a9a05f0 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz(a)kernel.org>
Date: Thu, 13 Jul 2023 08:06:57 +0100
Subject: [PATCH] KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t
preemption
Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when
running a preemptible kernel, as it is possible that a vCPU is blocked
without requesting a doorbell interrupt.
The issue is that any preemption that occurs between vgic_v4_put() and
schedule() on the block path will mark the vPE as nonresident and *not*
request a doorbell irq. This occurs because when the vcpu thread is
resumed on its way to block, vcpu_load() will make the vPE resident
again. Once the vcpu actually blocks, we don't request a doorbell
anymore, and the vcpu won't be woken up on interrupt delivery.
Fix it by tracking that we're entering WFI, and key the doorbell
request on that flag. This allows us not to make the vPE resident
when going through a preempt/schedule cycle, meaning we don't lose
any state.
Cc: stable(a)vger.kernel.org
Fixes: 8e01d9a396e6 ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put")
Reported-by: Xiang Chen <chenxiang66(a)hisilicon.com>
Suggested-by: Zenghui Yu <yuzenghui(a)huawei.com>
Tested-by: Xiang Chen <chenxiang66(a)hisilicon.com>
Co-developed-by: Oliver Upton <oliver.upton(a)linux.dev>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Acked-by: Zenghui Yu <yuzenghui(a)huawei.com>
Link: https://lore.kernel.org/r/20230713070657.3873244-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 8b6096753740..d3dd05bbfe23 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -727,6 +727,8 @@ struct kvm_vcpu_arch {
#define DBG_SS_ACTIVE_PENDING __vcpu_single_flag(sflags, BIT(5))
/* PMUSERENR for the guest EL0 is on physical CPU */
#define PMUSERENR_ON_CPU __vcpu_single_flag(sflags, BIT(6))
+/* WFI instruction trapped */
+#define IN_WFI __vcpu_single_flag(sflags, BIT(7))
/* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a402ea5511f3..72dc53a75d1c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -718,13 +718,15 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
*/
preempt_disable();
kvm_vgic_vmcr_sync(vcpu);
- vgic_v4_put(vcpu, true);
+ vcpu_set_flag(vcpu, IN_WFI);
+ vgic_v4_put(vcpu);
preempt_enable();
kvm_vcpu_halt(vcpu);
vcpu_clear_flag(vcpu, IN_WFIT);
preempt_disable();
+ vcpu_clear_flag(vcpu, IN_WFI);
vgic_v4_load(vcpu);
preempt_enable();
}
@@ -792,7 +794,7 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
if (kvm_check_request(KVM_REQ_RELOAD_GICv4, vcpu)) {
/* The distributor enable bits were changed */
preempt_disable();
- vgic_v4_put(vcpu, false);
+ vgic_v4_put(vcpu);
vgic_v4_load(vcpu);
preempt_enable();
}
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index c3b8e132d599..3dfc8b84e03e 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -749,7 +749,7 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
- WARN_ON(vgic_v4_put(vcpu, false));
+ WARN_ON(vgic_v4_put(vcpu));
vgic_v3_vmcr_sync(vcpu);
diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c
index c1c28fe680ba..339a55194b2c 100644
--- a/arch/arm64/kvm/vgic/vgic-v4.c
+++ b/arch/arm64/kvm/vgic/vgic-v4.c
@@ -336,14 +336,14 @@ void vgic_v4_teardown(struct kvm *kvm)
its_vm->vpes = NULL;
}
-int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db)
+int vgic_v4_put(struct kvm_vcpu *vcpu)
{
struct its_vpe *vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe;
if (!vgic_supports_direct_msis(vcpu->kvm) || !vpe->resident)
return 0;
- return its_make_vpe_non_resident(vpe, need_db);
+ return its_make_vpe_non_resident(vpe, !!vcpu_get_flag(vcpu, IN_WFI));
}
int vgic_v4_load(struct kvm_vcpu *vcpu)
@@ -354,6 +354,9 @@ int vgic_v4_load(struct kvm_vcpu *vcpu)
if (!vgic_supports_direct_msis(vcpu->kvm) || vpe->resident)
return 0;
+ if (vcpu_get_flag(vcpu, IN_WFI))
+ return 0;
+
/*
* Before making the VPE resident, make sure the redistributor
* corresponding to our current CPU expects us here. See the
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 402b545959af..5b27f94d4fad 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -431,7 +431,7 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int irq,
int vgic_v4_load(struct kvm_vcpu *vcpu);
void vgic_v4_commit(struct kvm_vcpu *vcpu);
-int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db);
+int vgic_v4_put(struct kvm_vcpu *vcpu);
/* CPU HP callbacks */
void kvm_vgic_cpu_up(void);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x df6556adf27b7372cfcd97e1c0afb0d516c8279f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072359-overfull-ascent-520d@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
df6556adf27b ("KVM: arm64: Correctly handle page aging notifiers for unaligned memslot")
ca5de2448c3b ("KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks")
6b91b8f95cad ("KVM: arm64: Use an opaque type for pteps")
fa002e8e79b3 ("KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data")
2a611c7f87f2 ("KVM: arm64: Pass mm_ops through the visitor context")
83844a2317ec ("KVM: arm64: Stash observed pte value in visitor context")
dfc7a7769ab7 ("KVM: arm64: Combine visitor arguments into a context structure")
094d00f8ca58 ("KVM: arm64: pkvm: Use the mm_ops indirection for cache maintenance")
e82edcc75c4e ("KVM: arm64: Implement do_share() helper for sharing memory")
82bb02445de5 ("KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2")
d6b4bd3f4897 ("KVM: arm64: Fixup hyp stage-1 refcount")
2ea2ff91e822 ("KVM: arm64: Refcount hyp stage-1 pgtable pages")
cf0c7125d578 ("Merge branch kvm-arm64/mmu/el2-tracking into kvmarm-master/next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df6556adf27b7372cfcd97e1c0afb0d516c8279f Mon Sep 17 00:00:00 2001
From: Oliver Upton <oliver.upton(a)linux.dev>
Date: Tue, 27 Jun 2023 23:54:05 +0000
Subject: [PATCH] KVM: arm64: Correctly handle page aging notifiers for
unaligned memslot
Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
memory. This is even the case with hugepages, although it is a rather
suboptimal configuration as PTE level mappings are used at stage-2.
The arm64 page aging handlers have an assumption that the specified
range is exactly one page/block of memory, which in the aforementioned
case is not necessarily true. All together this leads to the WARN() in
kvm_age_gfn() firing.
However, the WARN is only part of the issue as the table walkers visit
at most a single leaf PTE. For hugepage-backed memory in a memslot that
isn't hugepage-aligned, page aging entirely misses accesses to the
hugepage beyond the first page in the memslot.
Add a new walker dedicated to handling page aging MMU notifiers capable
of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
walker and drop the WARN that caught the issue in the first place. The
implementation of this walker was inspired by the test_clear_young()
implementation by Yu Zhao [*], but repurposed to address a bug in the
existing aging implementation.
Cc: stable(a)vger.kernel.org # v5.15
Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers")
Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/
Co-developed-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Reiji Watanabe <reijiw(a)google.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Link: https://lore.kernel.org/r/20230627235405.4069823-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8294a9a7e566..929d355eae0a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -608,22 +608,26 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
/**
- * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
+ * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
+ * flag in a page-table entry.
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Intermediate physical address to identify the page-table entry.
+ * @size: Size of the address range to visit.
+ * @mkold: True if the access flag should be cleared.
*
* The offset of @addr within a page is ignored.
*
- * If there is a valid, leaf page-table entry used to translate @addr, then
- * clear the access flag in that entry.
+ * Tests and conditionally clears the access flag for every valid, leaf
+ * page-table entry used to translate the range [@addr, @addr + @size).
*
* Note that it is the caller's responsibility to invalidate the TLB after
* calling this function to ensure that the updated permissions are visible
* to the CPUs.
*
- * Return: The old page-table entry prior to clearing the flag, 0 on failure.
+ * Return: True if any of the visited PTEs had the access flag set.
*/
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold);
/**
* kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
@@ -645,18 +649,6 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_prot prot);
-/**
- * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
- * access flag set.
- * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- * @addr: Intermediate physical address to identify the page-table entry.
- *
- * The offset of @addr within a page is ignored.
- *
- * Return: True if the page-table entry has the access flag set, false otherwise.
- */
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
-
/**
* kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
* of Coherency for guest stage-2 address
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index aa740a974e02..f7a93ef29250 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1195,25 +1195,54 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
return pte;
}
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
+struct stage2_age_data {
+ bool mkold;
+ bool young;
+};
+
+static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx,
+ enum kvm_pgtable_walk_flags visit)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
- &pte, NULL, 0);
+ kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data *data = ctx->arg;
+
+ if (!kvm_pte_valid(ctx->old) || new == ctx->old)
+ return 0;
+
+ data->young = true;
+
+ /*
+ * stage2_age_walker() is always called while holding the MMU lock for
+ * write, so this will always succeed. Nonetheless, this deliberately
+ * follows the race detection pattern of the other stage-2 walkers in
+ * case the locking mechanics of the MMU notifiers is ever changed.
+ */
+ if (data->mkold && !stage2_try_set_pte(ctx, new))
+ return -EAGAIN;
+
/*
* "But where's the TLBI?!", you scream.
* "Over in the core code", I sigh.
*
* See the '->clear_flush_young()' callback on the KVM mmu notifier.
*/
- return pte;
+ return 0;
}
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
- return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data data = {
+ .mkold = mkold,
+ };
+ struct kvm_pgtable_walker walker = {
+ .cb = stage2_age_walker,
+ .arg = &data,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker));
+ return data.young;
}
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6db9ef288ec3..d3b4feed460c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1756,27 +1756,25 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- kvm_pte_t kpte;
- pte_t pte;
if (!kvm->arch.mmu.pgt)
return false;
- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-
- kpte = kvm_pgtable_stage2_mkold(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
- pte = __pte(kpte);
- return pte_valid(pte) && pte_young(pte);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, true);
}
bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
+ u64 size = (range->end - range->start) << PAGE_SHIFT;
+
if (!kvm->arch.mmu.pgt)
return false;
- return kvm_pgtable_stage2_is_young(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, false);
}
phys_addr_t kvm_mmu_get_httbr(void)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x df6556adf27b7372cfcd97e1c0afb0d516c8279f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072357-trilogy-average-5fd9@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
df6556adf27b ("KVM: arm64: Correctly handle page aging notifiers for unaligned memslot")
ca5de2448c3b ("KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks")
6b91b8f95cad ("KVM: arm64: Use an opaque type for pteps")
fa002e8e79b3 ("KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data")
2a611c7f87f2 ("KVM: arm64: Pass mm_ops through the visitor context")
83844a2317ec ("KVM: arm64: Stash observed pte value in visitor context")
dfc7a7769ab7 ("KVM: arm64: Combine visitor arguments into a context structure")
094d00f8ca58 ("KVM: arm64: pkvm: Use the mm_ops indirection for cache maintenance")
e82edcc75c4e ("KVM: arm64: Implement do_share() helper for sharing memory")
82bb02445de5 ("KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2")
d6b4bd3f4897 ("KVM: arm64: Fixup hyp stage-1 refcount")
2ea2ff91e822 ("KVM: arm64: Refcount hyp stage-1 pgtable pages")
cf0c7125d578 ("Merge branch kvm-arm64/mmu/el2-tracking into kvmarm-master/next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df6556adf27b7372cfcd97e1c0afb0d516c8279f Mon Sep 17 00:00:00 2001
From: Oliver Upton <oliver.upton(a)linux.dev>
Date: Tue, 27 Jun 2023 23:54:05 +0000
Subject: [PATCH] KVM: arm64: Correctly handle page aging notifiers for
unaligned memslot
Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
memory. This is even the case with hugepages, although it is a rather
suboptimal configuration as PTE level mappings are used at stage-2.
The arm64 page aging handlers have an assumption that the specified
range is exactly one page/block of memory, which in the aforementioned
case is not necessarily true. All together this leads to the WARN() in
kvm_age_gfn() firing.
However, the WARN is only part of the issue as the table walkers visit
at most a single leaf PTE. For hugepage-backed memory in a memslot that
isn't hugepage-aligned, page aging entirely misses accesses to the
hugepage beyond the first page in the memslot.
Add a new walker dedicated to handling page aging MMU notifiers capable
of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
walker and drop the WARN that caught the issue in the first place. The
implementation of this walker was inspired by the test_clear_young()
implementation by Yu Zhao [*], but repurposed to address a bug in the
existing aging implementation.
Cc: stable(a)vger.kernel.org # v5.15
Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers")
Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/
Co-developed-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Reiji Watanabe <reijiw(a)google.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Link: https://lore.kernel.org/r/20230627235405.4069823-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8294a9a7e566..929d355eae0a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -608,22 +608,26 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
/**
- * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
+ * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
+ * flag in a page-table entry.
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Intermediate physical address to identify the page-table entry.
+ * @size: Size of the address range to visit.
+ * @mkold: True if the access flag should be cleared.
*
* The offset of @addr within a page is ignored.
*
- * If there is a valid, leaf page-table entry used to translate @addr, then
- * clear the access flag in that entry.
+ * Tests and conditionally clears the access flag for every valid, leaf
+ * page-table entry used to translate the range [@addr, @addr + @size).
*
* Note that it is the caller's responsibility to invalidate the TLB after
* calling this function to ensure that the updated permissions are visible
* to the CPUs.
*
- * Return: The old page-table entry prior to clearing the flag, 0 on failure.
+ * Return: True if any of the visited PTEs had the access flag set.
*/
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold);
/**
* kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
@@ -645,18 +649,6 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_prot prot);
-/**
- * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
- * access flag set.
- * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- * @addr: Intermediate physical address to identify the page-table entry.
- *
- * The offset of @addr within a page is ignored.
- *
- * Return: True if the page-table entry has the access flag set, false otherwise.
- */
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
-
/**
* kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
* of Coherency for guest stage-2 address
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index aa740a974e02..f7a93ef29250 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1195,25 +1195,54 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
return pte;
}
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
+struct stage2_age_data {
+ bool mkold;
+ bool young;
+};
+
+static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx,
+ enum kvm_pgtable_walk_flags visit)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
- &pte, NULL, 0);
+ kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data *data = ctx->arg;
+
+ if (!kvm_pte_valid(ctx->old) || new == ctx->old)
+ return 0;
+
+ data->young = true;
+
+ /*
+ * stage2_age_walker() is always called while holding the MMU lock for
+ * write, so this will always succeed. Nonetheless, this deliberately
+ * follows the race detection pattern of the other stage-2 walkers in
+ * case the locking mechanics of the MMU notifiers is ever changed.
+ */
+ if (data->mkold && !stage2_try_set_pte(ctx, new))
+ return -EAGAIN;
+
/*
* "But where's the TLBI?!", you scream.
* "Over in the core code", I sigh.
*
* See the '->clear_flush_young()' callback on the KVM mmu notifier.
*/
- return pte;
+ return 0;
}
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
- return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data data = {
+ .mkold = mkold,
+ };
+ struct kvm_pgtable_walker walker = {
+ .cb = stage2_age_walker,
+ .arg = &data,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker));
+ return data.young;
}
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6db9ef288ec3..d3b4feed460c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1756,27 +1756,25 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- kvm_pte_t kpte;
- pte_t pte;
if (!kvm->arch.mmu.pgt)
return false;
- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-
- kpte = kvm_pgtable_stage2_mkold(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
- pte = __pte(kpte);
- return pte_valid(pte) && pte_young(pte);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, true);
}
bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
+ u64 size = (range->end - range->start) << PAGE_SHIFT;
+
if (!kvm->arch.mmu.pgt)
return false;
- return kvm_pgtable_stage2_is_young(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, false);
}
phys_addr_t kvm_mmu_get_httbr(void)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x df6556adf27b7372cfcd97e1c0afb0d516c8279f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072356-margin-freckled-cde8@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
df6556adf27b ("KVM: arm64: Correctly handle page aging notifiers for unaligned memslot")
ca5de2448c3b ("KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks")
6b91b8f95cad ("KVM: arm64: Use an opaque type for pteps")
fa002e8e79b3 ("KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data")
2a611c7f87f2 ("KVM: arm64: Pass mm_ops through the visitor context")
83844a2317ec ("KVM: arm64: Stash observed pte value in visitor context")
dfc7a7769ab7 ("KVM: arm64: Combine visitor arguments into a context structure")
094d00f8ca58 ("KVM: arm64: pkvm: Use the mm_ops indirection for cache maintenance")
e82edcc75c4e ("KVM: arm64: Implement do_share() helper for sharing memory")
82bb02445de5 ("KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2")
d6b4bd3f4897 ("KVM: arm64: Fixup hyp stage-1 refcount")
2ea2ff91e822 ("KVM: arm64: Refcount hyp stage-1 pgtable pages")
cf0c7125d578 ("Merge branch kvm-arm64/mmu/el2-tracking into kvmarm-master/next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df6556adf27b7372cfcd97e1c0afb0d516c8279f Mon Sep 17 00:00:00 2001
From: Oliver Upton <oliver.upton(a)linux.dev>
Date: Tue, 27 Jun 2023 23:54:05 +0000
Subject: [PATCH] KVM: arm64: Correctly handle page aging notifiers for
unaligned memslot
Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
memory. This is even the case with hugepages, although it is a rather
suboptimal configuration as PTE level mappings are used at stage-2.
The arm64 page aging handlers have an assumption that the specified
range is exactly one page/block of memory, which in the aforementioned
case is not necessarily true. All together this leads to the WARN() in
kvm_age_gfn() firing.
However, the WARN is only part of the issue as the table walkers visit
at most a single leaf PTE. For hugepage-backed memory in a memslot that
isn't hugepage-aligned, page aging entirely misses accesses to the
hugepage beyond the first page in the memslot.
Add a new walker dedicated to handling page aging MMU notifiers capable
of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
walker and drop the WARN that caught the issue in the first place. The
implementation of this walker was inspired by the test_clear_young()
implementation by Yu Zhao [*], but repurposed to address a bug in the
existing aging implementation.
Cc: stable(a)vger.kernel.org # v5.15
Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers")
Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/
Co-developed-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Reiji Watanabe <reijiw(a)google.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Link: https://lore.kernel.org/r/20230627235405.4069823-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8294a9a7e566..929d355eae0a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -608,22 +608,26 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
/**
- * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
+ * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
+ * flag in a page-table entry.
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Intermediate physical address to identify the page-table entry.
+ * @size: Size of the address range to visit.
+ * @mkold: True if the access flag should be cleared.
*
* The offset of @addr within a page is ignored.
*
- * If there is a valid, leaf page-table entry used to translate @addr, then
- * clear the access flag in that entry.
+ * Tests and conditionally clears the access flag for every valid, leaf
+ * page-table entry used to translate the range [@addr, @addr + @size).
*
* Note that it is the caller's responsibility to invalidate the TLB after
* calling this function to ensure that the updated permissions are visible
* to the CPUs.
*
- * Return: The old page-table entry prior to clearing the flag, 0 on failure.
+ * Return: True if any of the visited PTEs had the access flag set.
*/
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold);
/**
* kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
@@ -645,18 +649,6 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_prot prot);
-/**
- * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
- * access flag set.
- * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- * @addr: Intermediate physical address to identify the page-table entry.
- *
- * The offset of @addr within a page is ignored.
- *
- * Return: True if the page-table entry has the access flag set, false otherwise.
- */
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
-
/**
* kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
* of Coherency for guest stage-2 address
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index aa740a974e02..f7a93ef29250 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1195,25 +1195,54 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
return pte;
}
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
+struct stage2_age_data {
+ bool mkold;
+ bool young;
+};
+
+static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx,
+ enum kvm_pgtable_walk_flags visit)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
- &pte, NULL, 0);
+ kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data *data = ctx->arg;
+
+ if (!kvm_pte_valid(ctx->old) || new == ctx->old)
+ return 0;
+
+ data->young = true;
+
+ /*
+ * stage2_age_walker() is always called while holding the MMU lock for
+ * write, so this will always succeed. Nonetheless, this deliberately
+ * follows the race detection pattern of the other stage-2 walkers in
+ * case the locking mechanics of the MMU notifiers is ever changed.
+ */
+ if (data->mkold && !stage2_try_set_pte(ctx, new))
+ return -EAGAIN;
+
/*
* "But where's the TLBI?!", you scream.
* "Over in the core code", I sigh.
*
* See the '->clear_flush_young()' callback on the KVM mmu notifier.
*/
- return pte;
+ return 0;
}
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
- return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data data = {
+ .mkold = mkold,
+ };
+ struct kvm_pgtable_walker walker = {
+ .cb = stage2_age_walker,
+ .arg = &data,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker));
+ return data.young;
}
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6db9ef288ec3..d3b4feed460c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1756,27 +1756,25 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- kvm_pte_t kpte;
- pte_t pte;
if (!kvm->arch.mmu.pgt)
return false;
- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-
- kpte = kvm_pgtable_stage2_mkold(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
- pte = __pte(kpte);
- return pte_valid(pte) && pte_young(pte);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, true);
}
bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
+ u64 size = (range->end - range->start) << PAGE_SHIFT;
+
if (!kvm->arch.mmu.pgt)
return false;
- return kvm_pgtable_stage2_is_young(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, false);
}
phys_addr_t kvm_mmu_get_httbr(void)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x df6556adf27b7372cfcd97e1c0afb0d516c8279f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072354-sublet-observing-2ff8@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
df6556adf27b ("KVM: arm64: Correctly handle page aging notifiers for unaligned memslot")
ca5de2448c3b ("KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks")
6b91b8f95cad ("KVM: arm64: Use an opaque type for pteps")
fa002e8e79b3 ("KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data")
2a611c7f87f2 ("KVM: arm64: Pass mm_ops through the visitor context")
83844a2317ec ("KVM: arm64: Stash observed pte value in visitor context")
dfc7a7769ab7 ("KVM: arm64: Combine visitor arguments into a context structure")
094d00f8ca58 ("KVM: arm64: pkvm: Use the mm_ops indirection for cache maintenance")
e82edcc75c4e ("KVM: arm64: Implement do_share() helper for sharing memory")
82bb02445de5 ("KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2")
d6b4bd3f4897 ("KVM: arm64: Fixup hyp stage-1 refcount")
2ea2ff91e822 ("KVM: arm64: Refcount hyp stage-1 pgtable pages")
cf0c7125d578 ("Merge branch kvm-arm64/mmu/el2-tracking into kvmarm-master/next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df6556adf27b7372cfcd97e1c0afb0d516c8279f Mon Sep 17 00:00:00 2001
From: Oliver Upton <oliver.upton(a)linux.dev>
Date: Tue, 27 Jun 2023 23:54:05 +0000
Subject: [PATCH] KVM: arm64: Correctly handle page aging notifiers for
unaligned memslot
Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
memory. This is even the case with hugepages, although it is a rather
suboptimal configuration as PTE level mappings are used at stage-2.
The arm64 page aging handlers have an assumption that the specified
range is exactly one page/block of memory, which in the aforementioned
case is not necessarily true. All together this leads to the WARN() in
kvm_age_gfn() firing.
However, the WARN is only part of the issue as the table walkers visit
at most a single leaf PTE. For hugepage-backed memory in a memslot that
isn't hugepage-aligned, page aging entirely misses accesses to the
hugepage beyond the first page in the memslot.
Add a new walker dedicated to handling page aging MMU notifiers capable
of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
walker and drop the WARN that caught the issue in the first place. The
implementation of this walker was inspired by the test_clear_young()
implementation by Yu Zhao [*], but repurposed to address a bug in the
existing aging implementation.
Cc: stable(a)vger.kernel.org # v5.15
Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers")
Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/
Co-developed-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Reiji Watanabe <reijiw(a)google.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Link: https://lore.kernel.org/r/20230627235405.4069823-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8294a9a7e566..929d355eae0a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -608,22 +608,26 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
/**
- * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
+ * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
+ * flag in a page-table entry.
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Intermediate physical address to identify the page-table entry.
+ * @size: Size of the address range to visit.
+ * @mkold: True if the access flag should be cleared.
*
* The offset of @addr within a page is ignored.
*
- * If there is a valid, leaf page-table entry used to translate @addr, then
- * clear the access flag in that entry.
+ * Tests and conditionally clears the access flag for every valid, leaf
+ * page-table entry used to translate the range [@addr, @addr + @size).
*
* Note that it is the caller's responsibility to invalidate the TLB after
* calling this function to ensure that the updated permissions are visible
* to the CPUs.
*
- * Return: The old page-table entry prior to clearing the flag, 0 on failure.
+ * Return: True if any of the visited PTEs had the access flag set.
*/
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold);
/**
* kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
@@ -645,18 +649,6 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_prot prot);
-/**
- * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
- * access flag set.
- * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- * @addr: Intermediate physical address to identify the page-table entry.
- *
- * The offset of @addr within a page is ignored.
- *
- * Return: True if the page-table entry has the access flag set, false otherwise.
- */
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
-
/**
* kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
* of Coherency for guest stage-2 address
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index aa740a974e02..f7a93ef29250 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1195,25 +1195,54 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
return pte;
}
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
+struct stage2_age_data {
+ bool mkold;
+ bool young;
+};
+
+static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx,
+ enum kvm_pgtable_walk_flags visit)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
- &pte, NULL, 0);
+ kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data *data = ctx->arg;
+
+ if (!kvm_pte_valid(ctx->old) || new == ctx->old)
+ return 0;
+
+ data->young = true;
+
+ /*
+ * stage2_age_walker() is always called while holding the MMU lock for
+ * write, so this will always succeed. Nonetheless, this deliberately
+ * follows the race detection pattern of the other stage-2 walkers in
+ * case the locking mechanics of the MMU notifiers is ever changed.
+ */
+ if (data->mkold && !stage2_try_set_pte(ctx, new))
+ return -EAGAIN;
+
/*
* "But where's the TLBI?!", you scream.
* "Over in the core code", I sigh.
*
* See the '->clear_flush_young()' callback on the KVM mmu notifier.
*/
- return pte;
+ return 0;
}
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
- return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data data = {
+ .mkold = mkold,
+ };
+ struct kvm_pgtable_walker walker = {
+ .cb = stage2_age_walker,
+ .arg = &data,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker));
+ return data.young;
}
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6db9ef288ec3..d3b4feed460c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1756,27 +1756,25 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- kvm_pte_t kpte;
- pte_t pte;
if (!kvm->arch.mmu.pgt)
return false;
- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-
- kpte = kvm_pgtable_stage2_mkold(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
- pte = __pte(kpte);
- return pte_valid(pte) && pte_young(pte);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, true);
}
bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
+ u64 size = (range->end - range->start) << PAGE_SHIFT;
+
if (!kvm->arch.mmu.pgt)
return false;
- return kvm_pgtable_stage2_is_young(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, false);
}
phys_addr_t kvm_mmu_get_httbr(void)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x df6556adf27b7372cfcd97e1c0afb0d516c8279f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072352-ooze-childcare-fd84@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
df6556adf27b ("KVM: arm64: Correctly handle page aging notifiers for unaligned memslot")
ca5de2448c3b ("KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks")
6b91b8f95cad ("KVM: arm64: Use an opaque type for pteps")
fa002e8e79b3 ("KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data")
2a611c7f87f2 ("KVM: arm64: Pass mm_ops through the visitor context")
83844a2317ec ("KVM: arm64: Stash observed pte value in visitor context")
dfc7a7769ab7 ("KVM: arm64: Combine visitor arguments into a context structure")
094d00f8ca58 ("KVM: arm64: pkvm: Use the mm_ops indirection for cache maintenance")
e82edcc75c4e ("KVM: arm64: Implement do_share() helper for sharing memory")
82bb02445de5 ("KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2")
d6b4bd3f4897 ("KVM: arm64: Fixup hyp stage-1 refcount")
2ea2ff91e822 ("KVM: arm64: Refcount hyp stage-1 pgtable pages")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df6556adf27b7372cfcd97e1c0afb0d516c8279f Mon Sep 17 00:00:00 2001
From: Oliver Upton <oliver.upton(a)linux.dev>
Date: Tue, 27 Jun 2023 23:54:05 +0000
Subject: [PATCH] KVM: arm64: Correctly handle page aging notifiers for
unaligned memslot
Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
memory. This is even the case with hugepages, although it is a rather
suboptimal configuration as PTE level mappings are used at stage-2.
The arm64 page aging handlers have an assumption that the specified
range is exactly one page/block of memory, which in the aforementioned
case is not necessarily true. All together this leads to the WARN() in
kvm_age_gfn() firing.
However, the WARN is only part of the issue as the table walkers visit
at most a single leaf PTE. For hugepage-backed memory in a memslot that
isn't hugepage-aligned, page aging entirely misses accesses to the
hugepage beyond the first page in the memslot.
Add a new walker dedicated to handling page aging MMU notifiers capable
of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
walker and drop the WARN that caught the issue in the first place. The
implementation of this walker was inspired by the test_clear_young()
implementation by Yu Zhao [*], but repurposed to address a bug in the
existing aging implementation.
Cc: stable(a)vger.kernel.org # v5.15
Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers")
Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/
Co-developed-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Reiji Watanabe <reijiw(a)google.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Link: https://lore.kernel.org/r/20230627235405.4069823-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8294a9a7e566..929d355eae0a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -608,22 +608,26 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
/**
- * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
+ * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
+ * flag in a page-table entry.
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Intermediate physical address to identify the page-table entry.
+ * @size: Size of the address range to visit.
+ * @mkold: True if the access flag should be cleared.
*
* The offset of @addr within a page is ignored.
*
- * If there is a valid, leaf page-table entry used to translate @addr, then
- * clear the access flag in that entry.
+ * Tests and conditionally clears the access flag for every valid, leaf
+ * page-table entry used to translate the range [@addr, @addr + @size).
*
* Note that it is the caller's responsibility to invalidate the TLB after
* calling this function to ensure that the updated permissions are visible
* to the CPUs.
*
- * Return: The old page-table entry prior to clearing the flag, 0 on failure.
+ * Return: True if any of the visited PTEs had the access flag set.
*/
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold);
/**
* kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
@@ -645,18 +649,6 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_prot prot);
-/**
- * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
- * access flag set.
- * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- * @addr: Intermediate physical address to identify the page-table entry.
- *
- * The offset of @addr within a page is ignored.
- *
- * Return: True if the page-table entry has the access flag set, false otherwise.
- */
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
-
/**
* kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
* of Coherency for guest stage-2 address
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index aa740a974e02..f7a93ef29250 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1195,25 +1195,54 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
return pte;
}
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
+struct stage2_age_data {
+ bool mkold;
+ bool young;
+};
+
+static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx,
+ enum kvm_pgtable_walk_flags visit)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
- &pte, NULL, 0);
+ kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data *data = ctx->arg;
+
+ if (!kvm_pte_valid(ctx->old) || new == ctx->old)
+ return 0;
+
+ data->young = true;
+
+ /*
+ * stage2_age_walker() is always called while holding the MMU lock for
+ * write, so this will always succeed. Nonetheless, this deliberately
+ * follows the race detection pattern of the other stage-2 walkers in
+ * case the locking mechanics of the MMU notifiers is ever changed.
+ */
+ if (data->mkold && !stage2_try_set_pte(ctx, new))
+ return -EAGAIN;
+
/*
* "But where's the TLBI?!", you scream.
* "Over in the core code", I sigh.
*
* See the '->clear_flush_young()' callback on the KVM mmu notifier.
*/
- return pte;
+ return 0;
}
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
- return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data data = {
+ .mkold = mkold,
+ };
+ struct kvm_pgtable_walker walker = {
+ .cb = stage2_age_walker,
+ .arg = &data,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker));
+ return data.young;
}
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6db9ef288ec3..d3b4feed460c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1756,27 +1756,25 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- kvm_pte_t kpte;
- pte_t pte;
if (!kvm->arch.mmu.pgt)
return false;
- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-
- kpte = kvm_pgtable_stage2_mkold(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
- pte = __pte(kpte);
- return pte_valid(pte) && pte_young(pte);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, true);
}
bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
+ u64 size = (range->end - range->start) << PAGE_SHIFT;
+
if (!kvm->arch.mmu.pgt)
return false;
- return kvm_pgtable_stage2_is_young(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, false);
}
phys_addr_t kvm_mmu_get_httbr(void)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x df6556adf27b7372cfcd97e1c0afb0d516c8279f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072351-slackness-unlearned-6f45@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
df6556adf27b ("KVM: arm64: Correctly handle page aging notifiers for unaligned memslot")
ca5de2448c3b ("KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks")
6b91b8f95cad ("KVM: arm64: Use an opaque type for pteps")
fa002e8e79b3 ("KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data")
2a611c7f87f2 ("KVM: arm64: Pass mm_ops through the visitor context")
83844a2317ec ("KVM: arm64: Stash observed pte value in visitor context")
dfc7a7769ab7 ("KVM: arm64: Combine visitor arguments into a context structure")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df6556adf27b7372cfcd97e1c0afb0d516c8279f Mon Sep 17 00:00:00 2001
From: Oliver Upton <oliver.upton(a)linux.dev>
Date: Tue, 27 Jun 2023 23:54:05 +0000
Subject: [PATCH] KVM: arm64: Correctly handle page aging notifiers for
unaligned memslot
Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
memory. This is even the case with hugepages, although it is a rather
suboptimal configuration as PTE level mappings are used at stage-2.
The arm64 page aging handlers have an assumption that the specified
range is exactly one page/block of memory, which in the aforementioned
case is not necessarily true. All together this leads to the WARN() in
kvm_age_gfn() firing.
However, the WARN is only part of the issue as the table walkers visit
at most a single leaf PTE. For hugepage-backed memory in a memslot that
isn't hugepage-aligned, page aging entirely misses accesses to the
hugepage beyond the first page in the memslot.
Add a new walker dedicated to handling page aging MMU notifiers capable
of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
walker and drop the WARN that caught the issue in the first place. The
implementation of this walker was inspired by the test_clear_young()
implementation by Yu Zhao [*], but repurposed to address a bug in the
existing aging implementation.
Cc: stable(a)vger.kernel.org # v5.15
Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers")
Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/
Co-developed-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Reiji Watanabe <reijiw(a)google.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Link: https://lore.kernel.org/r/20230627235405.4069823-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8294a9a7e566..929d355eae0a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -608,22 +608,26 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
/**
- * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
+ * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
+ * flag in a page-table entry.
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Intermediate physical address to identify the page-table entry.
+ * @size: Size of the address range to visit.
+ * @mkold: True if the access flag should be cleared.
*
* The offset of @addr within a page is ignored.
*
- * If there is a valid, leaf page-table entry used to translate @addr, then
- * clear the access flag in that entry.
+ * Tests and conditionally clears the access flag for every valid, leaf
+ * page-table entry used to translate the range [@addr, @addr + @size).
*
* Note that it is the caller's responsibility to invalidate the TLB after
* calling this function to ensure that the updated permissions are visible
* to the CPUs.
*
- * Return: The old page-table entry prior to clearing the flag, 0 on failure.
+ * Return: True if any of the visited PTEs had the access flag set.
*/
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold);
/**
* kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
@@ -645,18 +649,6 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_prot prot);
-/**
- * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
- * access flag set.
- * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- * @addr: Intermediate physical address to identify the page-table entry.
- *
- * The offset of @addr within a page is ignored.
- *
- * Return: True if the page-table entry has the access flag set, false otherwise.
- */
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
-
/**
* kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
* of Coherency for guest stage-2 address
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index aa740a974e02..f7a93ef29250 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1195,25 +1195,54 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
return pte;
}
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
+struct stage2_age_data {
+ bool mkold;
+ bool young;
+};
+
+static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx,
+ enum kvm_pgtable_walk_flags visit)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
- &pte, NULL, 0);
+ kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data *data = ctx->arg;
+
+ if (!kvm_pte_valid(ctx->old) || new == ctx->old)
+ return 0;
+
+ data->young = true;
+
+ /*
+ * stage2_age_walker() is always called while holding the MMU lock for
+ * write, so this will always succeed. Nonetheless, this deliberately
+ * follows the race detection pattern of the other stage-2 walkers in
+ * case the locking mechanics of the MMU notifiers is ever changed.
+ */
+ if (data->mkold && !stage2_try_set_pte(ctx, new))
+ return -EAGAIN;
+
/*
* "But where's the TLBI?!", you scream.
* "Over in the core code", I sigh.
*
* See the '->clear_flush_young()' callback on the KVM mmu notifier.
*/
- return pte;
+ return 0;
}
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
- return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data data = {
+ .mkold = mkold,
+ };
+ struct kvm_pgtable_walker walker = {
+ .cb = stage2_age_walker,
+ .arg = &data,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker));
+ return data.young;
}
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6db9ef288ec3..d3b4feed460c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1756,27 +1756,25 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- kvm_pte_t kpte;
- pte_t pte;
if (!kvm->arch.mmu.pgt)
return false;
- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-
- kpte = kvm_pgtable_stage2_mkold(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
- pte = __pte(kpte);
- return pte_valid(pte) && pte_young(pte);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, true);
}
bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
+ u64 size = (range->end - range->start) << PAGE_SHIFT;
+
if (!kvm->arch.mmu.pgt)
return false;
- return kvm_pgtable_stage2_is_young(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, false);
}
phys_addr_t kvm_mmu_get_httbr(void)
Recently Luiz Capitulino reported BPF test failure for kernel version
6.1.36 (see [7]). The following test_verifier test failed:
"precise: ST insn causing spi > allocated_stack".
After back-port of the following upstream commit:
ecdf985d7615 ("bpf: track immediate values written to stack by BPF_ST instruction")
Investigation in [8] shows that test failure is not a bug, but a
difference in BPF verifier behavior between upstream, where commits
[1,2,3] by Andrii Nakryiko are present, and 6.1.36, where these
commits are absent. Both Luiz and Greg suggested back-porting [1,2,3]
from upstream to avoid divergences.
Commits [1,2,3] break test_progs selftest "align/packet variable offset",
commit [4] fixes this selftest.
I did some additional testing using the following compiler versions:
- Kernel compilation
- gcc version 11.3.0
- BPF tests compilation
- clang version 16.0.6
- clang version 17.0.0 (fa46feb31481)
And identified a few more failing BPF selftests:
- Tests failing with LLVM 16:
- test_verifier:
- precise: ST insn causing spi > allocated_stack FAIL (fixed by [1,2,3])
- test_progs:
- sk_assign (fixed by [6])
- Tests failing with LLVM 17:
- test_verifier:
- precise: ST insn causing spi > allocated_stack FAIL (fixed by [1,2,3])
- test_progs:
- fexit_bpf2bpf/func_replace_verify (fixed by [5])
- fexit_bpf2bpf/func_replace_return_code (fixed by [5])
- sk_assign (fixed by [6])
Commits [4,5,6] only apply to BPF selftests and don't change verifier
behavior.
After applying all of the listed commits I have test_verifier,
test_progs, test_progs-no_alu32 and test_maps passing on my x86 setup,
both for LLVM 16 and LLVM 17.
Upstream commits in chronological order:
[1] be2ef8161572 ("bpf: allow precision tracking for programs with subprogs")
[2] f63181b6ae79 ("bpf: stop setting precise in current state")
[3] 7a830b53c17b ("bpf: aggressively forget precise markings during state checkpointing")
[4] 4f999b767769 ("selftests/bpf: make test_align selftest more robust")
[5] 63d78b7e8ca2 ("selftests/bpf: Workaround verification failure for fexit_bpf2bpf/func_replace_return_code")
[6] 7ce878ca81bc ("selftests/bpf: Fix sk_assign on s390x")
Links:
[7] https://lore.kernel.org/stable/935c4751-d368-df29-33a6-9f4fcae720fa@amazon.…
[8] https://lore.kernel.org/stable/c9b10a8a551edafdfec855fbd35757c6238ad258.cam…
Reported-by: Luiz Capitulino <luizcap(a)amazon.com>
Andrii Nakryiko (4):
bpf: allow precision tracking for programs with subprogs
bpf: stop setting precise in current state
bpf: aggressively forget precise markings during state checkpointing
selftests/bpf: make test_align selftest more robust
Ilya Leoshkevich (1):
selftests/bpf: Fix sk_assign on s390x
Yonghong Song (1):
selftests/bpf: Workaround verification failure for
fexit_bpf2bpf/func_replace_return_code
kernel/bpf/verifier.c | 202 ++++++++++++++++--
.../testing/selftests/bpf/prog_tests/align.c | 38 ++--
.../selftests/bpf/prog_tests/sk_assign.c | 25 ++-
.../selftests/bpf/progs/connect4_prog.c | 2 +-
.../selftests/bpf/progs/test_sk_assign.c | 11 +
.../bpf/progs/test_sk_assign_libbpf.c | 3 +
6 files changed, 247 insertions(+), 34 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/test_sk_assign_libbpf.c
--
2.41.0
Seen in arm:allmodconfig and arm64:allmodconfig builds with v5.4.249-278-g78f9a3d1c959.
sound/soc/mediatek/mt8173/mt8173-afe-pcm.c: In function 'mt8173_afe_pcm_dev_probe':
sound/soc/mediatek/mt8173/mt8173-afe-pcm.c:1159:17: error: label 'err_cleanup_components' used but not defined
Guenter
From: Mohamed Khalfella <mkhalfella(a)purestorage.com>
Commit 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if
they have referenced variables") added a check to fail histogram creation
if save_hist_vars() failed to add histogram to hist_vars list. But the
commit failed to set ret to failed return code before jumping to
unregister histogram, fix it.
Link: https://lore.kernel.org/linux-trace-kernel/20230714203341.51396-1-mkhalfell…
Cc: stable(a)vger.kernel.org
Fixes: 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if they have referenced variables")
Signed-off-by: Mohamed Khalfella <mkhalfella(a)purestorage.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace_events_hist.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index c8c61381eba4..d06938ae0717 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -6668,7 +6668,8 @@ static int event_hist_trigger_parse(struct event_command *cmd_ops,
goto out_unreg;
if (has_hist_vars(hist_data) || hist_data->n_var_refs) {
- if (save_hist_vars(hist_data))
+ ret = save_hist_vars(hist_data);
+ if (ret)
goto out_unreg;
}
--
2.40.1
`rustc` outputs by default the temporary files (i.e. the ones saved
by `-Csave-temps`, such as `*.rcgu*` files) in the current working
directory when `-o` and `--out-dir` are not given (even if
`--emit=x=path` is given, i.e. it does not use those for temporaries).
Since out-of-tree modules are compiled from the `linux` tree,
`rustc` then tries to create them there, which may not be accessible.
Thus pass `--out-dir` explicitly, even if it is just for the temporary
files.
Reported-by: Raphael Nestler <raphael.nestler(a)gmail.com>
Closes: https://github.com/Rust-for-Linux/linux/issues/1015
Reported-by: Andrea Righi <andrea.righi(a)canonical.com>
Tested-by: Raphael Nestler <raphael.nestler(a)gmail.com>
Tested-by: Andrea Righi <andrea.righi(a)canonical.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
---
scripts/Makefile.build | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 6413342a03f4..82e3fb19fdaf 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -264,6 +264,9 @@ $(obj)/%.lst: $(src)/%.c FORCE
rust_allowed_features := new_uninit
+# `--out-dir` is required to avoid temporaries being created by `rustc` in the
+# current working directory, which may be not accessible in the out-of-tree
+# modules case.
rust_common_cmd = \
RUST_MODFILE=$(modfile) $(RUSTC_OR_CLIPPY) $(rust_flags) \
-Zallow-features=$(rust_allowed_features) \
@@ -272,7 +275,7 @@ rust_common_cmd = \
--extern alloc --extern kernel \
--crate-type rlib -L $(objtree)/rust/ \
--crate-name $(basename $(notdir $@)) \
- --emit=dep-info=$(depfile)
+ --out-dir $(dir $@) --emit=dep-info=$(depfile)
# `--emit=obj`, `--emit=asm` and `--emit=llvm-ir` imply a single codegen unit
# will be used. We explicitly request `-Ccodegen-units=1` in any case, and
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
--
2.41.0
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x e51df4f81b02bcdd828a04de7c1eb6a92988b61e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072307-art-revocable-bd1b@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
e51df4f81b02 ("ASoC: cs42l51: fix driver to properly autoload with automatic module loading")
4a4043456cb8 ("ASoC: cs*: use simple i2c probe function")
f8593e885400 ("ASoC: cs42l42: Handle system suspend")
5982b5a8ec7d ("ASoC: cs42l42: Change jack_detect_mutex to a lock of all IRQ handling")
8d06f797f844 ("ASoC: cs42l42: Report full jack status when plug is detected")
a319cb32e7cf ("ASoC: cs4265: Add a remove() function")
fdd535283779 ("ASoC: cs42l42: Report initial jack state")
c778c01d3e66 ("ASoC: cs42l42: Remove unused runtime_suspend/runtime_resume callbacks")
bfceb9c21601 ("Merge branch 'asoc-5.15' into asoc-5.16")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e51df4f81b02bcdd828a04de7c1eb6a92988b61e Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com>
Date: Thu, 13 Jul 2023 13:21:12 +0200
Subject: [PATCH] ASoC: cs42l51: fix driver to properly autoload with automatic
module loading
In commit 2cb1e0259f50 ("ASoC: cs42l51: re-hook of_match_table
pointer"), 9 years ago, some random guy fixed the cs42l51 after it was
split into a core part and an I2C part to properly match based on a
Device Tree compatible string.
However, the fix in this commit is wrong: the MODULE_DEVICE_TABLE(of,
....) is in the core part of the driver, not the I2C part. Therefore,
automatic module loading based on module.alias, based on matching with
the DT compatible string, loads the core part of the driver, but not
the I2C part. And threfore, the i2c_driver is not registered, and the
codec is not known to the system, nor matched with a DT node with the
corresponding compatible string.
In order to fix that, we move the MODULE_DEVICE_TABLE(of, ...) into
the I2C part of the driver. The cs42l51_of_match[] array is also moved
as well, as it is not possible to have this definition in one file,
and the MODULE_DEVICE_TABLE(of, ...) invocation in another file, due
to how MODULE_DEVICE_TABLE works.
Thanks to this commit, the I2C part of the driver now properly
autoloads, and thanks to its dependency on the core part, the core
part gets autoloaded as well, resulting in a functional sound card
without having to manually load kernel modules.
Fixes: 2cb1e0259f50 ("ASoC: cs42l51: re-hook of_match_table pointer")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com>
Link: https://lore.kernel.org/r/20230713112112.778576-1-thomas.petazzoni@bootlin.…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/codecs/cs42l51-i2c.c b/sound/soc/codecs/cs42l51-i2c.c
index b2106ff6a7cb..e7db7bcd0296 100644
--- a/sound/soc/codecs/cs42l51-i2c.c
+++ b/sound/soc/codecs/cs42l51-i2c.c
@@ -19,6 +19,12 @@ static struct i2c_device_id cs42l51_i2c_id[] = {
};
MODULE_DEVICE_TABLE(i2c, cs42l51_i2c_id);
+const struct of_device_id cs42l51_of_match[] = {
+ { .compatible = "cirrus,cs42l51", },
+ { }
+};
+MODULE_DEVICE_TABLE(of, cs42l51_of_match);
+
static int cs42l51_i2c_probe(struct i2c_client *i2c)
{
struct regmap_config config;
diff --git a/sound/soc/codecs/cs42l51.c b/sound/soc/codecs/cs42l51.c
index a67cd3ee84e0..a7079ae0ca09 100644
--- a/sound/soc/codecs/cs42l51.c
+++ b/sound/soc/codecs/cs42l51.c
@@ -823,13 +823,6 @@ int __maybe_unused cs42l51_resume(struct device *dev)
}
EXPORT_SYMBOL_GPL(cs42l51_resume);
-const struct of_device_id cs42l51_of_match[] = {
- { .compatible = "cirrus,cs42l51", },
- { }
-};
-MODULE_DEVICE_TABLE(of, cs42l51_of_match);
-EXPORT_SYMBOL_GPL(cs42l51_of_match);
-
MODULE_AUTHOR("Arnaud Patard <arnaud.patard(a)rtp-net.org>");
MODULE_DESCRIPTION("Cirrus Logic CS42L51 ALSA SoC Codec Driver");
MODULE_LICENSE("GPL");
diff --git a/sound/soc/codecs/cs42l51.h b/sound/soc/codecs/cs42l51.h
index a79343e8a54e..125703ede113 100644
--- a/sound/soc/codecs/cs42l51.h
+++ b/sound/soc/codecs/cs42l51.h
@@ -16,7 +16,6 @@ int cs42l51_probe(struct device *dev, struct regmap *regmap);
void cs42l51_remove(struct device *dev);
int __maybe_unused cs42l51_suspend(struct device *dev);
int __maybe_unused cs42l51_resume(struct device *dev);
-extern const struct of_device_id cs42l51_of_match[];
#define CS42L51_CHIP_ID 0x1B
#define CS42L51_CHIP_REV_A 0x00
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 2a9482e55968ed7368afaa9c2133404069117320
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072302-unengaged-strum-b2ad@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
2a9482e55968 ("drm/amd/display: Prevent vtotal from being set to 0")
1a4bcdbea431 ("drm/amd/display: Fix possible underflow for displays with large vblank")
469a62938a45 ("drm/amd/display: update extended blank for dcn314 onwards")
e3416e872f84 ("drm/amd/display: Add FAMS validation before trying to use it")
0db13eae41fc ("drm/amd/display: Add minimum Z8 residency debug option")
73dd4ca4b5a0 ("drm/amd/display: Fix Z8 support configurations")
db4107e92a81 ("drm/amd/display: fix dc/core/dc.c kernel-doc")
00812bfc7bcb ("drm/amd/display: Add debug option to skip PSR CRTC disable")
80676936805e ("drm/amd/display: Add Z8 allow states to z-state support list")
e366f36958f6 ("drm/amd/display: Rework comments on dc file")
bd829d570773 ("drm/amd/display: Refactor eDP PSR codes")
6f4f8ff567c4 ("drm/amd/display: Display does not light up after S4 resume")
4f5bdde386d3 ("drm/amd/display: Update PMFW z-state interface for DCN314")
8cab4ef0ad95 ("drm/amd/display: Keep OTG on when Z10 is disable")
1178ac68dc28 ("drm/amd/display: Refactor edp ILR caps codes")
5d4b59146078 ("drm/amd/display: Add ABM control to panel_config struct.")
936675464b1f ("drm/amd/display: Add debug option for exiting idle optimizations on cursor updates")
eccff6cdde6f ("drm/amd/display: Refactor edp panel power sequencer(PPS) codes")
9f6f6be163df ("drm/amd/display: Add comments.")
c17a34e0526f ("drm/amd/display: Refactor edp dsc codes.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2a9482e55968ed7368afaa9c2133404069117320 Mon Sep 17 00:00:00 2001
From: Daniel Miess <daniel.miess(a)amd.com>
Date: Thu, 22 Jun 2023 08:11:48 -0400
Subject: [PATCH] drm/amd/display: Prevent vtotal from being set to 0
[Why]
In dcn314 DML the destination pipe vtotal was being set
to the crtc adjustment vtotal_min value even in cases
where that value is 0.
[How]
Only set vtotal to the crtc adjustment vtotal_min value
in cases where the value is non-zero.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
Acked-by: Alan Liu <haoping.liu(a)amd.com>
Signed-off-by: Daniel Miess <daniel.miess(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
index d9e049e7ff0a..ed8ddb75b333 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
@@ -295,7 +295,11 @@ int dcn314_populate_dml_pipes_from_context_fpu(struct dc *dc, struct dc_state *c
pipe = &res_ctx->pipe_ctx[i];
timing = &pipe->stream->timing;
- pipes[pipe_cnt].pipe.dest.vtotal = pipe->stream->adjust.v_total_min;
+ if (pipe->stream->adjust.v_total_min != 0)
+ pipes[pipe_cnt].pipe.dest.vtotal = pipe->stream->adjust.v_total_min;
+ else
+ pipes[pipe_cnt].pipe.dest.vtotal = timing->v_total;
+
pipes[pipe_cnt].pipe.dest.vblank_nom = timing->v_total - pipes[pipe_cnt].pipe.dest.vactive;
pipes[pipe_cnt].pipe.dest.vblank_nom = min(pipes[pipe_cnt].pipe.dest.vblank_nom, dcn3_14_ip.VBlankNomDefaultUS);
pipes[pipe_cnt].pipe.dest.vblank_nom = max(pipes[pipe_cnt].pipe.dest.vblank_nom, timing->v_sync_width);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 2a9482e55968ed7368afaa9c2133404069117320
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072301-urging-upheld-1851@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
2a9482e55968 ("drm/amd/display: Prevent vtotal from being set to 0")
1a4bcdbea431 ("drm/amd/display: Fix possible underflow for displays with large vblank")
469a62938a45 ("drm/amd/display: update extended blank for dcn314 onwards")
e3416e872f84 ("drm/amd/display: Add FAMS validation before trying to use it")
0db13eae41fc ("drm/amd/display: Add minimum Z8 residency debug option")
73dd4ca4b5a0 ("drm/amd/display: Fix Z8 support configurations")
db4107e92a81 ("drm/amd/display: fix dc/core/dc.c kernel-doc")
00812bfc7bcb ("drm/amd/display: Add debug option to skip PSR CRTC disable")
80676936805e ("drm/amd/display: Add Z8 allow states to z-state support list")
e366f36958f6 ("drm/amd/display: Rework comments on dc file")
bd829d570773 ("drm/amd/display: Refactor eDP PSR codes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2a9482e55968ed7368afaa9c2133404069117320 Mon Sep 17 00:00:00 2001
From: Daniel Miess <daniel.miess(a)amd.com>
Date: Thu, 22 Jun 2023 08:11:48 -0400
Subject: [PATCH] drm/amd/display: Prevent vtotal from being set to 0
[Why]
In dcn314 DML the destination pipe vtotal was being set
to the crtc adjustment vtotal_min value even in cases
where that value is 0.
[How]
Only set vtotal to the crtc adjustment vtotal_min value
in cases where the value is non-zero.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
Acked-by: Alan Liu <haoping.liu(a)amd.com>
Signed-off-by: Daniel Miess <daniel.miess(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
index d9e049e7ff0a..ed8ddb75b333 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
@@ -295,7 +295,11 @@ int dcn314_populate_dml_pipes_from_context_fpu(struct dc *dc, struct dc_state *c
pipe = &res_ctx->pipe_ctx[i];
timing = &pipe->stream->timing;
- pipes[pipe_cnt].pipe.dest.vtotal = pipe->stream->adjust.v_total_min;
+ if (pipe->stream->adjust.v_total_min != 0)
+ pipes[pipe_cnt].pipe.dest.vtotal = pipe->stream->adjust.v_total_min;
+ else
+ pipes[pipe_cnt].pipe.dest.vtotal = timing->v_total;
+
pipes[pipe_cnt].pipe.dest.vblank_nom = timing->v_total - pipes[pipe_cnt].pipe.dest.vactive;
pipes[pipe_cnt].pipe.dest.vblank_nom = min(pipes[pipe_cnt].pipe.dest.vblank_nom, dcn3_14_ip.VBlankNomDefaultUS);
pipes[pipe_cnt].pipe.dest.vblank_nom = max(pipes[pipe_cnt].pipe.dest.vblank_nom, timing->v_sync_width);
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2a9482e55968ed7368afaa9c2133404069117320
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072300-thwarting-luckiness-18bd@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2a9482e55968ed7368afaa9c2133404069117320 Mon Sep 17 00:00:00 2001
From: Daniel Miess <daniel.miess(a)amd.com>
Date: Thu, 22 Jun 2023 08:11:48 -0400
Subject: [PATCH] drm/amd/display: Prevent vtotal from being set to 0
[Why]
In dcn314 DML the destination pipe vtotal was being set
to the crtc adjustment vtotal_min value even in cases
where that value is 0.
[How]
Only set vtotal to the crtc adjustment vtotal_min value
in cases where the value is non-zero.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
Acked-by: Alan Liu <haoping.liu(a)amd.com>
Signed-off-by: Daniel Miess <daniel.miess(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
index d9e049e7ff0a..ed8ddb75b333 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
@@ -295,7 +295,11 @@ int dcn314_populate_dml_pipes_from_context_fpu(struct dc *dc, struct dc_state *c
pipe = &res_ctx->pipe_ctx[i];
timing = &pipe->stream->timing;
- pipes[pipe_cnt].pipe.dest.vtotal = pipe->stream->adjust.v_total_min;
+ if (pipe->stream->adjust.v_total_min != 0)
+ pipes[pipe_cnt].pipe.dest.vtotal = pipe->stream->adjust.v_total_min;
+ else
+ pipes[pipe_cnt].pipe.dest.vtotal = timing->v_total;
+
pipes[pipe_cnt].pipe.dest.vblank_nom = timing->v_total - pipes[pipe_cnt].pipe.dest.vactive;
pipes[pipe_cnt].pipe.dest.vblank_nom = min(pipes[pipe_cnt].pipe.dest.vblank_nom, dcn3_14_ip.VBlankNomDefaultUS);
pipes[pipe_cnt].pipe.dest.vblank_nom = max(pipes[pipe_cnt].pipe.dest.vblank_nom, timing->v_sync_width);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x b42ae87a7b3878afaf4c3852ca66c025a5b996e0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072344-gyration-coveted-1049@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
b42ae87a7b38 ("drm/amdgpu/vkms: relax timer deactivation by hrtimer_try_to_cancel")
deefd07eedb7 ("drm/amdgpu: fix vkms crtc settings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b42ae87a7b3878afaf4c3852ca66c025a5b996e0 Mon Sep 17 00:00:00 2001
From: Guchun Chen <guchun.chen(a)amd.com>
Date: Thu, 6 Jul 2023 15:57:21 +0800
Subject: [PATCH] drm/amdgpu/vkms: relax timer deactivation by
hrtimer_try_to_cancel
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In below thousands of screen rotation loop tests with virtual display
enabled, a CPU hard lockup issue may happen, leading system to unresponsive
and crash.
do {
xrandr --output Virtual --rotate inverted
xrandr --output Virtual --rotate right
xrandr --output Virtual --rotate left
xrandr --output Virtual --rotate normal
} while (1);
NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
? hrtimer_run_softirq+0x140/0x140
? store_vblank+0xe0/0xe0 [drm]
hrtimer_cancel+0x15/0x30
amdgpu_vkms_disable_vblank+0x15/0x30 [amdgpu]
drm_vblank_disable_and_save+0x185/0x1f0 [drm]
drm_crtc_vblank_off+0x159/0x4c0 [drm]
? record_print_text.cold+0x11/0x11
? wait_for_completion_timeout+0x232/0x280
? drm_crtc_wait_one_vblank+0x40/0x40 [drm]
? bit_wait_io_timeout+0xe0/0xe0
? wait_for_completion_interruptible+0x1d7/0x320
? mutex_unlock+0x81/0xd0
amdgpu_vkms_crtc_atomic_disable
It's caused by a stuck in lock dependency in such scenario on different
CPUs.
CPU1 CPU2
drm_crtc_vblank_off hrtimer_interrupt
grab event_lock (irq disabled) __hrtimer_run_queues
grab vbl_lock/vblank_time_block amdgpu_vkms_vblank_simulate
amdgpu_vkms_disable_vblank drm_handle_vblank
hrtimer_cancel grab dev->event_lock
So CPU1 stucks in hrtimer_cancel as timer callback is running endless on
current clock base, as that timer queue on CPU2 has no chance to finish it
because of failing to hold the lock. So NMI watchdog will throw the errors
after its threshold, and all later CPUs are impacted/blocked.
So use hrtimer_try_to_cancel to fix this, as disable_vblank callback
does not need to wait the handler to finish. And also it's not necessary
to check the return value of hrtimer_try_to_cancel, because even if it's
-1 which means current timer callback is running, it will be reprogrammed
in hrtimer_start with calling enable_vblank to make it works.
v2: only re-arm timer when vblank is enabled (Christian) and add a Fixes
tag as well
v3: drop warn printing (Christian)
v4: drop superfluous check of blank->enabled in timer function, as it's
guaranteed in drm_handle_vblank (Christian)
Fixes: 84ec374bd580 ("drm/amdgpu: create amdgpu_vkms (v4)")
Cc: stable(a)vger.kernel.org
Suggested-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Guchun Chen <guchun.chen(a)amd.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
index 53ff91fc6cf6..d0748bcfad16 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
@@ -55,8 +55,9 @@ static enum hrtimer_restart amdgpu_vkms_vblank_simulate(struct hrtimer *timer)
DRM_WARN("%s: vblank timer overrun\n", __func__);
ret = drm_crtc_handle_vblank(crtc);
+ /* Don't queue timer again when vblank is disabled. */
if (!ret)
- DRM_ERROR("amdgpu_vkms failure on handling vblank");
+ return HRTIMER_NORESTART;
return HRTIMER_RESTART;
}
@@ -81,7 +82,7 @@ static void amdgpu_vkms_disable_vblank(struct drm_crtc *crtc)
{
struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
- hrtimer_cancel(&amdgpu_crtc->vblank_timer);
+ hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer);
}
static bool amdgpu_vkms_get_vblank_timestamp(struct drm_crtc *crtc,
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 5886e4d5ecec3e22844efed90b2dd383ef804b3a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072302-hasty-mocker-848a@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
5886e4d5ecec ("can: gs_usb: fix time stamp counter initialization")
2603be9e8167 ("can: gs_usb: gs_can_open(): improve error handling")
56c56a309e79 ("can: gs_usb: remove gs_can::iface")
0c9f92a4b795 ("can: gs_usb: add support for reading error counters")
2f3cdad1c616 ("can: gs_usb: add ability to enable / disable berr reporting")
ac3f25824e4f ("can: gs_usb: gs_can_open(): merge setting of timestamp flags and init")
f6adf410f70b ("can: gs_usb: gs_can_open(): sort checks for ctrlmode")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5886e4d5ecec3e22844efed90b2dd383ef804b3a Mon Sep 17 00:00:00 2001
From: Marc Kleine-Budde <mkl(a)pengutronix.de>
Date: Fri, 7 Jul 2023 18:44:23 +0200
Subject: [PATCH] can: gs_usb: fix time stamp counter initialization
If the gs_usb device driver is unloaded (or unbound) before the
interface is shut down, the USB stack first calls the struct
usb_driver::disconnect and then the struct net_device_ops::ndo_stop
callback.
In gs_usb_disconnect() all pending bulk URBs are killed, i.e. no more
RX'ed CAN frames are send from the USB device to the host. Later in
gs_can_close() a reset control message is send to each CAN channel to
remove the controller from the CAN bus. In this race window the USB
device can still receive CAN frames from the bus and internally queue
them to be send to the host.
At least in the current version of the candlelight firmware, the queue
of received CAN frames is not emptied during the reset command. After
loading (or binding) the gs_usb driver, new URBs are submitted during
the struct net_device_ops::ndo_open callback and the candlelight
firmware starts sending its already queued CAN frames to the host.
However, this scenario was not considered when implementing the
hardware timestamp function. The cycle counter/time counter
infrastructure is set up (gs_usb_timestamp_init()) after the USBs are
submitted, resulting in a NULL pointer dereference if
timecounter_cyc2time() (via the call chain:
gs_usb_receive_bulk_callback() -> gs_usb_set_timestamp() ->
gs_usb_skb_set_timestamp()) is called too early.
Move the gs_usb_timestamp_init() function before the URBs are
submitted to fix this problem.
For a comprehensive solution, we need to consider gs_usb devices with
more than 1 channel. The cycle counter/time counter infrastructure is
setup per channel, but the RX URBs are per device. Once gs_can_open()
of _a_ channel has been called, and URBs have been submitted, the
gs_usb_receive_bulk_callback() can be called for _all_ available
channels, even for channels that are not running, yet. As cycle
counter/time counter has not set up, this will again lead to a NULL
pointer dereference.
Convert the cycle counter/time counter from a "per channel" to a "per
device" functionality. Also set it up, before submitting any URBs to
the device.
Further in gs_usb_receive_bulk_callback(), don't process any URBs for
not started CAN channels, only resubmit the URB.
Fixes: 45dfa45f52e6 ("can: gs_usb: add RX and TX hardware timestamp support")
Closes: https://github.com/candle-usb/candleLight_fw/issues/137#issuecomment-162353…
Cc: stable(a)vger.kernel.org
Cc: John Whittington <git(a)jbrengineering.co.uk>
Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-2-901…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
index 85b7b59c8426..f418066569fc 100644
--- a/drivers/net/can/usb/gs_usb.c
+++ b/drivers/net/can/usb/gs_usb.c
@@ -303,12 +303,6 @@ struct gs_can {
struct can_bittiming_const bt_const, data_bt_const;
unsigned int channel; /* channel number */
- /* time counter for hardware timestamps */
- struct cyclecounter cc;
- struct timecounter tc;
- spinlock_t tc_lock; /* spinlock to guard access tc->cycle_last */
- struct delayed_work timestamp;
-
u32 feature;
unsigned int hf_size_tx;
@@ -325,6 +319,13 @@ struct gs_usb {
struct gs_can *canch[GS_MAX_INTF];
struct usb_anchor rx_submitted;
struct usb_device *udev;
+
+ /* time counter for hardware timestamps */
+ struct cyclecounter cc;
+ struct timecounter tc;
+ spinlock_t tc_lock; /* spinlock to guard access tc->cycle_last */
+ struct delayed_work timestamp;
+
unsigned int hf_size_rx;
u8 active_channels;
};
@@ -388,15 +389,15 @@ static int gs_cmd_reset(struct gs_can *dev)
GFP_KERNEL);
}
-static inline int gs_usb_get_timestamp(const struct gs_can *dev,
+static inline int gs_usb_get_timestamp(const struct gs_usb *parent,
u32 *timestamp_p)
{
__le32 timestamp;
int rc;
- rc = usb_control_msg_recv(dev->udev, 0, GS_USB_BREQ_TIMESTAMP,
+ rc = usb_control_msg_recv(parent->udev, 0, GS_USB_BREQ_TIMESTAMP,
USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_INTERFACE,
- dev->channel, 0,
+ 0, 0,
×tamp, sizeof(timestamp),
USB_CTRL_GET_TIMEOUT,
GFP_KERNEL);
@@ -410,20 +411,20 @@ static inline int gs_usb_get_timestamp(const struct gs_can *dev,
static u64 gs_usb_timestamp_read(const struct cyclecounter *cc) __must_hold(&dev->tc_lock)
{
- struct gs_can *dev = container_of(cc, struct gs_can, cc);
+ struct gs_usb *parent = container_of(cc, struct gs_usb, cc);
u32 timestamp = 0;
int err;
- lockdep_assert_held(&dev->tc_lock);
+ lockdep_assert_held(&parent->tc_lock);
/* drop lock for synchronous USB transfer */
- spin_unlock_bh(&dev->tc_lock);
- err = gs_usb_get_timestamp(dev, ×tamp);
- spin_lock_bh(&dev->tc_lock);
+ spin_unlock_bh(&parent->tc_lock);
+ err = gs_usb_get_timestamp(parent, ×tamp);
+ spin_lock_bh(&parent->tc_lock);
if (err)
- netdev_err(dev->netdev,
- "Error %d while reading timestamp. HW timestamps may be inaccurate.",
- err);
+ dev_err(&parent->udev->dev,
+ "Error %d while reading timestamp. HW timestamps may be inaccurate.",
+ err);
return timestamp;
}
@@ -431,14 +432,14 @@ static u64 gs_usb_timestamp_read(const struct cyclecounter *cc) __must_hold(&dev
static void gs_usb_timestamp_work(struct work_struct *work)
{
struct delayed_work *delayed_work = to_delayed_work(work);
- struct gs_can *dev;
+ struct gs_usb *parent;
- dev = container_of(delayed_work, struct gs_can, timestamp);
- spin_lock_bh(&dev->tc_lock);
- timecounter_read(&dev->tc);
- spin_unlock_bh(&dev->tc_lock);
+ parent = container_of(delayed_work, struct gs_usb, timestamp);
+ spin_lock_bh(&parent->tc_lock);
+ timecounter_read(&parent->tc);
+ spin_unlock_bh(&parent->tc_lock);
- schedule_delayed_work(&dev->timestamp,
+ schedule_delayed_work(&parent->timestamp,
GS_USB_TIMESTAMP_WORK_DELAY_SEC * HZ);
}
@@ -446,37 +447,38 @@ static void gs_usb_skb_set_timestamp(struct gs_can *dev,
struct sk_buff *skb, u32 timestamp)
{
struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb);
+ struct gs_usb *parent = dev->parent;
u64 ns;
- spin_lock_bh(&dev->tc_lock);
- ns = timecounter_cyc2time(&dev->tc, timestamp);
- spin_unlock_bh(&dev->tc_lock);
+ spin_lock_bh(&parent->tc_lock);
+ ns = timecounter_cyc2time(&parent->tc, timestamp);
+ spin_unlock_bh(&parent->tc_lock);
hwtstamps->hwtstamp = ns_to_ktime(ns);
}
-static void gs_usb_timestamp_init(struct gs_can *dev)
+static void gs_usb_timestamp_init(struct gs_usb *parent)
{
- struct cyclecounter *cc = &dev->cc;
+ struct cyclecounter *cc = &parent->cc;
cc->read = gs_usb_timestamp_read;
cc->mask = CYCLECOUNTER_MASK(32);
cc->shift = 32 - bits_per(NSEC_PER_SEC / GS_USB_TIMESTAMP_TIMER_HZ);
cc->mult = clocksource_hz2mult(GS_USB_TIMESTAMP_TIMER_HZ, cc->shift);
- spin_lock_init(&dev->tc_lock);
- spin_lock_bh(&dev->tc_lock);
- timecounter_init(&dev->tc, &dev->cc, ktime_get_real_ns());
- spin_unlock_bh(&dev->tc_lock);
+ spin_lock_init(&parent->tc_lock);
+ spin_lock_bh(&parent->tc_lock);
+ timecounter_init(&parent->tc, &parent->cc, ktime_get_real_ns());
+ spin_unlock_bh(&parent->tc_lock);
- INIT_DELAYED_WORK(&dev->timestamp, gs_usb_timestamp_work);
- schedule_delayed_work(&dev->timestamp,
+ INIT_DELAYED_WORK(&parent->timestamp, gs_usb_timestamp_work);
+ schedule_delayed_work(&parent->timestamp,
GS_USB_TIMESTAMP_WORK_DELAY_SEC * HZ);
}
-static void gs_usb_timestamp_stop(struct gs_can *dev)
+static void gs_usb_timestamp_stop(struct gs_usb *parent)
{
- cancel_delayed_work_sync(&dev->timestamp);
+ cancel_delayed_work_sync(&parent->timestamp);
}
static void gs_update_state(struct gs_can *dev, struct can_frame *cf)
@@ -560,6 +562,9 @@ static void gs_usb_receive_bulk_callback(struct urb *urb)
if (!netif_device_present(netdev))
return;
+ if (!netif_running(netdev))
+ goto resubmit_urb;
+
if (hf->echo_id == -1) { /* normal rx */
if (hf->flags & GS_CAN_FLAG_FD) {
skb = alloc_canfd_skb(dev->netdev, &cfd);
@@ -856,6 +861,9 @@ static int gs_can_open(struct net_device *netdev)
}
if (!parent->active_channels) {
+ if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
+ gs_usb_timestamp_init(parent);
+
for (i = 0; i < GS_MAX_RX_URBS; i++) {
u8 *buf;
@@ -926,13 +934,9 @@ static int gs_can_open(struct net_device *netdev)
flags |= GS_CAN_MODE_FD;
/* if hardware supports timestamps, enable it */
- if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) {
+ if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
flags |= GS_CAN_MODE_HW_TIMESTAMP;
- /* start polling timestamp */
- gs_usb_timestamp_init(dev);
- }
-
/* finally start device */
dev->can.state = CAN_STATE_ERROR_ACTIVE;
dm.flags = cpu_to_le32(flags);
@@ -942,8 +946,6 @@ static int gs_can_open(struct net_device *netdev)
GFP_KERNEL);
if (rc) {
netdev_err(netdev, "Couldn't start device (err=%d)\n", rc);
- if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
- gs_usb_timestamp_stop(dev);
dev->can.state = CAN_STATE_STOPPED;
goto out_usb_kill_anchored_urbs;
@@ -960,9 +962,13 @@ static int gs_can_open(struct net_device *netdev)
out_usb_free_urb:
usb_free_urb(urb);
out_usb_kill_anchored_urbs:
- if (!parent->active_channels)
+ if (!parent->active_channels) {
usb_kill_anchored_urbs(&dev->tx_submitted);
+ if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
+ gs_usb_timestamp_stop(parent);
+ }
+
close_candev(netdev);
return rc;
@@ -1011,14 +1017,13 @@ static int gs_can_close(struct net_device *netdev)
netif_stop_queue(netdev);
- /* stop polling timestamp */
- if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
- gs_usb_timestamp_stop(dev);
-
/* Stop polling */
parent->active_channels--;
if (!parent->active_channels) {
usb_kill_anchored_urbs(&parent->rx_submitted);
+
+ if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
+ gs_usb_timestamp_stop(parent);
}
/* Stop sending URBs */
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 2603be9e8167ddc7bea95dcfab9ffc33414215aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072354-hungrily-ancient-3409@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
2603be9e8167 ("can: gs_usb: gs_can_open(): improve error handling")
62f102c0d156 ("can: gs_usb: remove dma allocations")
5440428b3da6 ("can: gs_usb: gs_can_open(): fix race dev->can.state condition")
2bda24ef95c0 ("can: gs_usb: gs_usb_open/close(): fix memory leak")
c359931d2545 ("can: gs_usb: use union and FLEX_ARRAY for data in struct gs_host_frame")
5374d083117c ("can: gs_usb: gs_usb_probe(): introduce udev and make use of it")
c1ee72690cdd ("can: gs_usb: rewrap usb_control_msg() and usb_fill_bulk_urb()")
035b0fcf0270 ("can: gs_usb: change active_channels's type from atomic_t to u8")
108194666a3f ("can: gs_usb: use %u to print unsigned values")
5c39f26e67c9 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2603be9e8167ddc7bea95dcfab9ffc33414215aa Mon Sep 17 00:00:00 2001
From: Marc Kleine-Budde <mkl(a)pengutronix.de>
Date: Fri, 7 Jul 2023 13:43:10 +0200
Subject: [PATCH] can: gs_usb: gs_can_open(): improve error handling
The gs_usb driver handles USB devices with more than 1 CAN channel.
The RX path for all channels share the same bulk endpoint (the
transmitted bulk data encodes the channel number). These per-device
resources are allocated and submitted by the first opened channel.
During this allocation, the resources are either released immediately
in case of a failure or the URBs are anchored. All anchored URBs are
finally killed with gs_usb_disconnect().
Currently, gs_can_open() returns with an error if the allocation of a
URB or a buffer fails. However, if usb_submit_urb() fails, the driver
continues with the URBs submitted so far, even if no URBs were
successfully submitted.
Treat every error as fatal and free all allocated resources
immediately.
Switch to goto-style error handling, to prepare the driver for more
per-device resource allocation.
Cc: stable(a)vger.kernel.org
Cc: John Whittington <git(a)jbrengineering.co.uk>
Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-1-901…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
index d476c2884008..85b7b59c8426 100644
--- a/drivers/net/can/usb/gs_usb.c
+++ b/drivers/net/can/usb/gs_usb.c
@@ -833,6 +833,7 @@ static int gs_can_open(struct net_device *netdev)
.mode = cpu_to_le32(GS_CAN_MODE_START),
};
struct gs_host_frame *hf;
+ struct urb *urb = NULL;
u32 ctrlmode;
u32 flags = 0;
int rc, i;
@@ -856,13 +857,14 @@ static int gs_can_open(struct net_device *netdev)
if (!parent->active_channels) {
for (i = 0; i < GS_MAX_RX_URBS; i++) {
- struct urb *urb;
u8 *buf;
/* alloc rx urb */
urb = usb_alloc_urb(0, GFP_KERNEL);
- if (!urb)
- return -ENOMEM;
+ if (!urb) {
+ rc = -ENOMEM;
+ goto out_usb_kill_anchored_urbs;
+ }
/* alloc rx buffer */
buf = kmalloc(dev->parent->hf_size_rx,
@@ -870,8 +872,8 @@ static int gs_can_open(struct net_device *netdev)
if (!buf) {
netdev_err(netdev,
"No memory left for USB buffer\n");
- usb_free_urb(urb);
- return -ENOMEM;
+ rc = -ENOMEM;
+ goto out_usb_free_urb;
}
/* fill, anchor, and submit rx urb */
@@ -894,9 +896,7 @@ static int gs_can_open(struct net_device *netdev)
netdev_err(netdev,
"usb_submit failed (err=%d)\n", rc);
- usb_unanchor_urb(urb);
- usb_free_urb(urb);
- break;
+ goto out_usb_unanchor_urb;
}
/* Drop reference,
@@ -945,7 +945,8 @@ static int gs_can_open(struct net_device *netdev)
if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
gs_usb_timestamp_stop(dev);
dev->can.state = CAN_STATE_STOPPED;
- return rc;
+
+ goto out_usb_kill_anchored_urbs;
}
parent->active_channels++;
@@ -953,6 +954,18 @@ static int gs_can_open(struct net_device *netdev)
netif_start_queue(netdev);
return 0;
+
+out_usb_unanchor_urb:
+ usb_unanchor_urb(urb);
+out_usb_free_urb:
+ usb_free_urb(urb);
+out_usb_kill_anchored_urbs:
+ if (!parent->active_channels)
+ usb_kill_anchored_urbs(&dev->tx_submitted);
+
+ close_candev(netdev);
+
+ return rc;
}
static int gs_usb_get_state(const struct net_device *netdev,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 2603be9e8167ddc7bea95dcfab9ffc33414215aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072353-gummy-prowess-7c23@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
2603be9e8167 ("can: gs_usb: gs_can_open(): improve error handling")
62f102c0d156 ("can: gs_usb: remove dma allocations")
5440428b3da6 ("can: gs_usb: gs_can_open(): fix race dev->can.state condition")
2bda24ef95c0 ("can: gs_usb: gs_usb_open/close(): fix memory leak")
c359931d2545 ("can: gs_usb: use union and FLEX_ARRAY for data in struct gs_host_frame")
5374d083117c ("can: gs_usb: gs_usb_probe(): introduce udev and make use of it")
c1ee72690cdd ("can: gs_usb: rewrap usb_control_msg() and usb_fill_bulk_urb()")
035b0fcf0270 ("can: gs_usb: change active_channels's type from atomic_t to u8")
108194666a3f ("can: gs_usb: use %u to print unsigned values")
5c39f26e67c9 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2603be9e8167ddc7bea95dcfab9ffc33414215aa Mon Sep 17 00:00:00 2001
From: Marc Kleine-Budde <mkl(a)pengutronix.de>
Date: Fri, 7 Jul 2023 13:43:10 +0200
Subject: [PATCH] can: gs_usb: gs_can_open(): improve error handling
The gs_usb driver handles USB devices with more than 1 CAN channel.
The RX path for all channels share the same bulk endpoint (the
transmitted bulk data encodes the channel number). These per-device
resources are allocated and submitted by the first opened channel.
During this allocation, the resources are either released immediately
in case of a failure or the URBs are anchored. All anchored URBs are
finally killed with gs_usb_disconnect().
Currently, gs_can_open() returns with an error if the allocation of a
URB or a buffer fails. However, if usb_submit_urb() fails, the driver
continues with the URBs submitted so far, even if no URBs were
successfully submitted.
Treat every error as fatal and free all allocated resources
immediately.
Switch to goto-style error handling, to prepare the driver for more
per-device resource allocation.
Cc: stable(a)vger.kernel.org
Cc: John Whittington <git(a)jbrengineering.co.uk>
Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-1-901…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
index d476c2884008..85b7b59c8426 100644
--- a/drivers/net/can/usb/gs_usb.c
+++ b/drivers/net/can/usb/gs_usb.c
@@ -833,6 +833,7 @@ static int gs_can_open(struct net_device *netdev)
.mode = cpu_to_le32(GS_CAN_MODE_START),
};
struct gs_host_frame *hf;
+ struct urb *urb = NULL;
u32 ctrlmode;
u32 flags = 0;
int rc, i;
@@ -856,13 +857,14 @@ static int gs_can_open(struct net_device *netdev)
if (!parent->active_channels) {
for (i = 0; i < GS_MAX_RX_URBS; i++) {
- struct urb *urb;
u8 *buf;
/* alloc rx urb */
urb = usb_alloc_urb(0, GFP_KERNEL);
- if (!urb)
- return -ENOMEM;
+ if (!urb) {
+ rc = -ENOMEM;
+ goto out_usb_kill_anchored_urbs;
+ }
/* alloc rx buffer */
buf = kmalloc(dev->parent->hf_size_rx,
@@ -870,8 +872,8 @@ static int gs_can_open(struct net_device *netdev)
if (!buf) {
netdev_err(netdev,
"No memory left for USB buffer\n");
- usb_free_urb(urb);
- return -ENOMEM;
+ rc = -ENOMEM;
+ goto out_usb_free_urb;
}
/* fill, anchor, and submit rx urb */
@@ -894,9 +896,7 @@ static int gs_can_open(struct net_device *netdev)
netdev_err(netdev,
"usb_submit failed (err=%d)\n", rc);
- usb_unanchor_urb(urb);
- usb_free_urb(urb);
- break;
+ goto out_usb_unanchor_urb;
}
/* Drop reference,
@@ -945,7 +945,8 @@ static int gs_can_open(struct net_device *netdev)
if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
gs_usb_timestamp_stop(dev);
dev->can.state = CAN_STATE_STOPPED;
- return rc;
+
+ goto out_usb_kill_anchored_urbs;
}
parent->active_channels++;
@@ -953,6 +954,18 @@ static int gs_can_open(struct net_device *netdev)
netif_start_queue(netdev);
return 0;
+
+out_usb_unanchor_urb:
+ usb_unanchor_urb(urb);
+out_usb_free_urb:
+ usb_free_urb(urb);
+out_usb_kill_anchored_urbs:
+ if (!parent->active_channels)
+ usb_kill_anchored_urbs(&dev->tx_submitted);
+
+ close_candev(netdev);
+
+ return rc;
}
static int gs_usb_get_state(const struct net_device *netdev,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2603be9e8167ddc7bea95dcfab9ffc33414215aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072351-brethren-premiere-da35@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
2603be9e8167 ("can: gs_usb: gs_can_open(): improve error handling")
62f102c0d156 ("can: gs_usb: remove dma allocations")
5440428b3da6 ("can: gs_usb: gs_can_open(): fix race dev->can.state condition")
2bda24ef95c0 ("can: gs_usb: gs_usb_open/close(): fix memory leak")
c359931d2545 ("can: gs_usb: use union and FLEX_ARRAY for data in struct gs_host_frame")
5374d083117c ("can: gs_usb: gs_usb_probe(): introduce udev and make use of it")
c1ee72690cdd ("can: gs_usb: rewrap usb_control_msg() and usb_fill_bulk_urb()")
035b0fcf0270 ("can: gs_usb: change active_channels's type from atomic_t to u8")
108194666a3f ("can: gs_usb: use %u to print unsigned values")
5c39f26e67c9 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2603be9e8167ddc7bea95dcfab9ffc33414215aa Mon Sep 17 00:00:00 2001
From: Marc Kleine-Budde <mkl(a)pengutronix.de>
Date: Fri, 7 Jul 2023 13:43:10 +0200
Subject: [PATCH] can: gs_usb: gs_can_open(): improve error handling
The gs_usb driver handles USB devices with more than 1 CAN channel.
The RX path for all channels share the same bulk endpoint (the
transmitted bulk data encodes the channel number). These per-device
resources are allocated and submitted by the first opened channel.
During this allocation, the resources are either released immediately
in case of a failure or the URBs are anchored. All anchored URBs are
finally killed with gs_usb_disconnect().
Currently, gs_can_open() returns with an error if the allocation of a
URB or a buffer fails. However, if usb_submit_urb() fails, the driver
continues with the URBs submitted so far, even if no URBs were
successfully submitted.
Treat every error as fatal and free all allocated resources
immediately.
Switch to goto-style error handling, to prepare the driver for more
per-device resource allocation.
Cc: stable(a)vger.kernel.org
Cc: John Whittington <git(a)jbrengineering.co.uk>
Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-1-901…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
index d476c2884008..85b7b59c8426 100644
--- a/drivers/net/can/usb/gs_usb.c
+++ b/drivers/net/can/usb/gs_usb.c
@@ -833,6 +833,7 @@ static int gs_can_open(struct net_device *netdev)
.mode = cpu_to_le32(GS_CAN_MODE_START),
};
struct gs_host_frame *hf;
+ struct urb *urb = NULL;
u32 ctrlmode;
u32 flags = 0;
int rc, i;
@@ -856,13 +857,14 @@ static int gs_can_open(struct net_device *netdev)
if (!parent->active_channels) {
for (i = 0; i < GS_MAX_RX_URBS; i++) {
- struct urb *urb;
u8 *buf;
/* alloc rx urb */
urb = usb_alloc_urb(0, GFP_KERNEL);
- if (!urb)
- return -ENOMEM;
+ if (!urb) {
+ rc = -ENOMEM;
+ goto out_usb_kill_anchored_urbs;
+ }
/* alloc rx buffer */
buf = kmalloc(dev->parent->hf_size_rx,
@@ -870,8 +872,8 @@ static int gs_can_open(struct net_device *netdev)
if (!buf) {
netdev_err(netdev,
"No memory left for USB buffer\n");
- usb_free_urb(urb);
- return -ENOMEM;
+ rc = -ENOMEM;
+ goto out_usb_free_urb;
}
/* fill, anchor, and submit rx urb */
@@ -894,9 +896,7 @@ static int gs_can_open(struct net_device *netdev)
netdev_err(netdev,
"usb_submit failed (err=%d)\n", rc);
- usb_unanchor_urb(urb);
- usb_free_urb(urb);
- break;
+ goto out_usb_unanchor_urb;
}
/* Drop reference,
@@ -945,7 +945,8 @@ static int gs_can_open(struct net_device *netdev)
if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
gs_usb_timestamp_stop(dev);
dev->can.state = CAN_STATE_STOPPED;
- return rc;
+
+ goto out_usb_kill_anchored_urbs;
}
parent->active_channels++;
@@ -953,6 +954,18 @@ static int gs_can_open(struct net_device *netdev)
netif_start_queue(netdev);
return 0;
+
+out_usb_unanchor_urb:
+ usb_unanchor_urb(urb);
+out_usb_free_urb:
+ usb_free_urb(urb);
+out_usb_kill_anchored_urbs:
+ if (!parent->active_channels)
+ usb_kill_anchored_urbs(&dev->tx_submitted);
+
+ close_candev(netdev);
+
+ return rc;
}
static int gs_usb_get_state(const struct net_device *netdev,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 2603be9e8167ddc7bea95dcfab9ffc33414215aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072350-remorse-graduate-cd73@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
2603be9e8167 ("can: gs_usb: gs_can_open(): improve error handling")
62f102c0d156 ("can: gs_usb: remove dma allocations")
5440428b3da6 ("can: gs_usb: gs_can_open(): fix race dev->can.state condition")
2bda24ef95c0 ("can: gs_usb: gs_usb_open/close(): fix memory leak")
c359931d2545 ("can: gs_usb: use union and FLEX_ARRAY for data in struct gs_host_frame")
5374d083117c ("can: gs_usb: gs_usb_probe(): introduce udev and make use of it")
c1ee72690cdd ("can: gs_usb: rewrap usb_control_msg() and usb_fill_bulk_urb()")
035b0fcf0270 ("can: gs_usb: change active_channels's type from atomic_t to u8")
108194666a3f ("can: gs_usb: use %u to print unsigned values")
5c39f26e67c9 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2603be9e8167ddc7bea95dcfab9ffc33414215aa Mon Sep 17 00:00:00 2001
From: Marc Kleine-Budde <mkl(a)pengutronix.de>
Date: Fri, 7 Jul 2023 13:43:10 +0200
Subject: [PATCH] can: gs_usb: gs_can_open(): improve error handling
The gs_usb driver handles USB devices with more than 1 CAN channel.
The RX path for all channels share the same bulk endpoint (the
transmitted bulk data encodes the channel number). These per-device
resources are allocated and submitted by the first opened channel.
During this allocation, the resources are either released immediately
in case of a failure or the URBs are anchored. All anchored URBs are
finally killed with gs_usb_disconnect().
Currently, gs_can_open() returns with an error if the allocation of a
URB or a buffer fails. However, if usb_submit_urb() fails, the driver
continues with the URBs submitted so far, even if no URBs were
successfully submitted.
Treat every error as fatal and free all allocated resources
immediately.
Switch to goto-style error handling, to prepare the driver for more
per-device resource allocation.
Cc: stable(a)vger.kernel.org
Cc: John Whittington <git(a)jbrengineering.co.uk>
Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-1-901…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
index d476c2884008..85b7b59c8426 100644
--- a/drivers/net/can/usb/gs_usb.c
+++ b/drivers/net/can/usb/gs_usb.c
@@ -833,6 +833,7 @@ static int gs_can_open(struct net_device *netdev)
.mode = cpu_to_le32(GS_CAN_MODE_START),
};
struct gs_host_frame *hf;
+ struct urb *urb = NULL;
u32 ctrlmode;
u32 flags = 0;
int rc, i;
@@ -856,13 +857,14 @@ static int gs_can_open(struct net_device *netdev)
if (!parent->active_channels) {
for (i = 0; i < GS_MAX_RX_URBS; i++) {
- struct urb *urb;
u8 *buf;
/* alloc rx urb */
urb = usb_alloc_urb(0, GFP_KERNEL);
- if (!urb)
- return -ENOMEM;
+ if (!urb) {
+ rc = -ENOMEM;
+ goto out_usb_kill_anchored_urbs;
+ }
/* alloc rx buffer */
buf = kmalloc(dev->parent->hf_size_rx,
@@ -870,8 +872,8 @@ static int gs_can_open(struct net_device *netdev)
if (!buf) {
netdev_err(netdev,
"No memory left for USB buffer\n");
- usb_free_urb(urb);
- return -ENOMEM;
+ rc = -ENOMEM;
+ goto out_usb_free_urb;
}
/* fill, anchor, and submit rx urb */
@@ -894,9 +896,7 @@ static int gs_can_open(struct net_device *netdev)
netdev_err(netdev,
"usb_submit failed (err=%d)\n", rc);
- usb_unanchor_urb(urb);
- usb_free_urb(urb);
- break;
+ goto out_usb_unanchor_urb;
}
/* Drop reference,
@@ -945,7 +945,8 @@ static int gs_can_open(struct net_device *netdev)
if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
gs_usb_timestamp_stop(dev);
dev->can.state = CAN_STATE_STOPPED;
- return rc;
+
+ goto out_usb_kill_anchored_urbs;
}
parent->active_channels++;
@@ -953,6 +954,18 @@ static int gs_can_open(struct net_device *netdev)
netif_start_queue(netdev);
return 0;
+
+out_usb_unanchor_urb:
+ usb_unanchor_urb(urb);
+out_usb_free_urb:
+ usb_free_urb(urb);
+out_usb_kill_anchored_urbs:
+ if (!parent->active_channels)
+ usb_kill_anchored_urbs(&dev->tx_submitted);
+
+ close_candev(netdev);
+
+ return rc;
}
static int gs_usb_get_state(const struct net_device *netdev,
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 2603be9e8167ddc7bea95dcfab9ffc33414215aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072349-capricorn-donor-f895@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
2603be9e8167 ("can: gs_usb: gs_can_open(): improve error handling")
62f102c0d156 ("can: gs_usb: remove dma allocations")
5440428b3da6 ("can: gs_usb: gs_can_open(): fix race dev->can.state condition")
2bda24ef95c0 ("can: gs_usb: gs_usb_open/close(): fix memory leak")
c359931d2545 ("can: gs_usb: use union and FLEX_ARRAY for data in struct gs_host_frame")
5374d083117c ("can: gs_usb: gs_usb_probe(): introduce udev and make use of it")
c1ee72690cdd ("can: gs_usb: rewrap usb_control_msg() and usb_fill_bulk_urb()")
035b0fcf0270 ("can: gs_usb: change active_channels's type from atomic_t to u8")
108194666a3f ("can: gs_usb: use %u to print unsigned values")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2603be9e8167ddc7bea95dcfab9ffc33414215aa Mon Sep 17 00:00:00 2001
From: Marc Kleine-Budde <mkl(a)pengutronix.de>
Date: Fri, 7 Jul 2023 13:43:10 +0200
Subject: [PATCH] can: gs_usb: gs_can_open(): improve error handling
The gs_usb driver handles USB devices with more than 1 CAN channel.
The RX path for all channels share the same bulk endpoint (the
transmitted bulk data encodes the channel number). These per-device
resources are allocated and submitted by the first opened channel.
During this allocation, the resources are either released immediately
in case of a failure or the URBs are anchored. All anchored URBs are
finally killed with gs_usb_disconnect().
Currently, gs_can_open() returns with an error if the allocation of a
URB or a buffer fails. However, if usb_submit_urb() fails, the driver
continues with the URBs submitted so far, even if no URBs were
successfully submitted.
Treat every error as fatal and free all allocated resources
immediately.
Switch to goto-style error handling, to prepare the driver for more
per-device resource allocation.
Cc: stable(a)vger.kernel.org
Cc: John Whittington <git(a)jbrengineering.co.uk>
Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-1-901…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
index d476c2884008..85b7b59c8426 100644
--- a/drivers/net/can/usb/gs_usb.c
+++ b/drivers/net/can/usb/gs_usb.c
@@ -833,6 +833,7 @@ static int gs_can_open(struct net_device *netdev)
.mode = cpu_to_le32(GS_CAN_MODE_START),
};
struct gs_host_frame *hf;
+ struct urb *urb = NULL;
u32 ctrlmode;
u32 flags = 0;
int rc, i;
@@ -856,13 +857,14 @@ static int gs_can_open(struct net_device *netdev)
if (!parent->active_channels) {
for (i = 0; i < GS_MAX_RX_URBS; i++) {
- struct urb *urb;
u8 *buf;
/* alloc rx urb */
urb = usb_alloc_urb(0, GFP_KERNEL);
- if (!urb)
- return -ENOMEM;
+ if (!urb) {
+ rc = -ENOMEM;
+ goto out_usb_kill_anchored_urbs;
+ }
/* alloc rx buffer */
buf = kmalloc(dev->parent->hf_size_rx,
@@ -870,8 +872,8 @@ static int gs_can_open(struct net_device *netdev)
if (!buf) {
netdev_err(netdev,
"No memory left for USB buffer\n");
- usb_free_urb(urb);
- return -ENOMEM;
+ rc = -ENOMEM;
+ goto out_usb_free_urb;
}
/* fill, anchor, and submit rx urb */
@@ -894,9 +896,7 @@ static int gs_can_open(struct net_device *netdev)
netdev_err(netdev,
"usb_submit failed (err=%d)\n", rc);
- usb_unanchor_urb(urb);
- usb_free_urb(urb);
- break;
+ goto out_usb_unanchor_urb;
}
/* Drop reference,
@@ -945,7 +945,8 @@ static int gs_can_open(struct net_device *netdev)
if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
gs_usb_timestamp_stop(dev);
dev->can.state = CAN_STATE_STOPPED;
- return rc;
+
+ goto out_usb_kill_anchored_urbs;
}
parent->active_channels++;
@@ -953,6 +954,18 @@ static int gs_can_open(struct net_device *netdev)
netif_start_queue(netdev);
return 0;
+
+out_usb_unanchor_urb:
+ usb_unanchor_urb(urb);
+out_usb_free_urb:
+ usb_free_urb(urb);
+out_usb_kill_anchored_urbs:
+ if (!parent->active_channels)
+ usb_kill_anchored_urbs(&dev->tx_submitted);
+
+ close_candev(netdev);
+
+ return rc;
}
static int gs_usb_get_state(const struct net_device *netdev,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 9efa1a5407e81265ea502cab83be4de503decc49
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072314-spent-tamper-2f00@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
9efa1a5407e8 ("can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout")
c9e6b80dfd48 ("can: mcp251xfd: update macros describing ring, FIFO and RAM layout")
0a1f2e6502a1 ("can: mcp251xfd: ring: prepare support for runtime configurable RX/TX ring parameters")
aada74220f00 ("can: mcp251xfd: mcp251xfd_priv: introduce macros specifying the number of supported TEF/RX/TX rings")
62713f0d9a38 ("can: mcp251xfd: ring: change order of TX and RX FIFOs")
617283b9c4db ("can: mcp251xfd: ring: prepare to change order of TX and RX FIFOs")
d2d5397fcae1 ("can: mcp251xfd: mcp251xfd_ring_init(): split ring_init into separate functions")
c912f19ee382 ("can: mcp251xfd: introduce struct mcp251xfd_tx_ring::nr and ::fifo_nr and make use of it")
2a68dd8663ea ("can: mcp251xfd: add support for internal PLL")
e39ea1360ca7 ("can: mcp251xfd: mcp251xfd_chip_clock_init(): prepare for PLL support, wait for OSC ready")
a10fd91e42e8 ("can: mcp251xfd: __mcp251xfd_chip_set_mode(): prepare for PLL support: improve error handling and diagnostics")
14193ea2bfee ("can: mcp251xfd: mcp251xfd_chip_timestamp_init(): factor out into separate function")
1ba3690fa2c6 ("can: mcp251xfd: mcp251xfd_chip_sleep(): introduce function to bring chip into sleep mode")
3044a4f271d2 ("can: mcp251xfd: introduce and make use of mcp251xfd_is_fd_mode()")
55bc37c85587 ("can: mcp251xfd: move ring init into separate function")
335c818c5a7a ("can: mcp251xfd: move chip FIFO init into separate file")
1e846c7aeb06 ("can: mcp251xfd: move TEF handling into separate file")
319fdbc9433c ("can: mcp251xfd: move RX handling into separate file")
cae9071bc5ea ("can: mcp251xfd: mcp251xfd.h: sort function prototypes")
99e7cc3b3f85 ("can: mcp251xfd: mcp251xfd_tef_obj_read(): fix typo in error message")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9efa1a5407e81265ea502cab83be4de503decc49 Mon Sep 17 00:00:00 2001
From: Fedor Ross <fedor.ross(a)ifm.com>
Date: Thu, 4 May 2023 21:50:59 +0200
Subject: [PATCH] can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll
timeout
The mcp251xfd controller needs an idle bus to enter 'Normal CAN 2.0
mode' or . The maximum length of a CAN frame is 736 bits (64 data
bytes, CAN-FD, EFF mode, worst case bit stuffing and interframe
spacing). For low bit rates like 10 kbit/s the arbitrarily chosen
MCP251XFD_POLL_TIMEOUT_US of 1 ms is too small.
Otherwise during polling for the CAN controller to enter 'Normal CAN
2.0 mode' the timeout limit is exceeded and the configuration fails
with:
| $ ip link set dev can1 up type can bitrate 10000
| [ 731.911072] mcp251xfd spi2.1 can1: Controller failed to enter mode CAN 2.0 Mode (6) and stays in Configuration Mode (4) (con=0x068b0760, osc=0x00000468).
| [ 731.927192] mcp251xfd spi2.1 can1: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
| [ 731.938101] A link change request failed with some changes committed already. Interface can1 may have been left with an inconsistent configuration, please check.
| RTNETLINK answers: Connection timed out
Make MCP251XFD_POLL_TIMEOUT_US timeout calculation dynamic. Use
maximum of 1ms and bit time of 1 full 64 data bytes CAN-FD frame in
EFF mode, worst case bit stuffing and interframe spacing at the
current bit rate.
For easier backporting define the macro MCP251XFD_FRAME_LEN_MAX_BITS
that holds the max frame length in bits, which is 736. This can be
replaced by can_frame_bits(true, true, true, true, CANFD_MAX_DLEN) in
a cleanup patch later.
Fixes: 55e5b97f003e8 ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Signed-off-by: Fedor Ross <fedor.ross(a)ifm.com>
Signed-off-by: Marek Vasut <marex(a)denx.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20230717-mcp251xfd-fix-increase-poll-timeout-v5…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
index 68df6d4641b5..eebf967f4711 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
@@ -227,6 +227,8 @@ static int
__mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv,
const u8 mode_req, bool nowait)
{
+ const struct can_bittiming *bt = &priv->can.bittiming;
+ unsigned long timeout_us = MCP251XFD_POLL_TIMEOUT_US;
u32 con = 0, con_reqop, osc = 0;
u8 mode;
int err;
@@ -246,12 +248,16 @@ __mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv,
if (mode_req == MCP251XFD_REG_CON_MODE_SLEEP || nowait)
return 0;
+ if (bt->bitrate)
+ timeout_us = max_t(unsigned long, timeout_us,
+ MCP251XFD_FRAME_LEN_MAX_BITS * USEC_PER_SEC /
+ bt->bitrate);
+
err = regmap_read_poll_timeout(priv->map_reg, MCP251XFD_REG_CON, con,
!mcp251xfd_reg_invalid(con) &&
FIELD_GET(MCP251XFD_REG_CON_OPMOD_MASK,
con) == mode_req,
- MCP251XFD_POLL_SLEEP_US,
- MCP251XFD_POLL_TIMEOUT_US);
+ MCP251XFD_POLL_SLEEP_US, timeout_us);
if (err != -ETIMEDOUT && err != -EBADMSG)
return err;
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
index 7024ff0cc2c0..24510b3b8020 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
@@ -387,6 +387,7 @@ static_assert(MCP251XFD_TIMESTAMP_WORK_DELAY_SEC <
#define MCP251XFD_OSC_STAB_TIMEOUT_US (10 * MCP251XFD_OSC_STAB_SLEEP_US)
#define MCP251XFD_POLL_SLEEP_US (10)
#define MCP251XFD_POLL_TIMEOUT_US (USEC_PER_MSEC)
+#define MCP251XFD_FRAME_LEN_MAX_BITS (736)
/* Misc */
#define MCP251XFD_NAPI_WEIGHT 32
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 9efa1a5407e81265ea502cab83be4de503decc49
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072312-thaw-tug-693d@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
9efa1a5407e8 ("can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout")
c9e6b80dfd48 ("can: mcp251xfd: update macros describing ring, FIFO and RAM layout")
0a1f2e6502a1 ("can: mcp251xfd: ring: prepare support for runtime configurable RX/TX ring parameters")
aada74220f00 ("can: mcp251xfd: mcp251xfd_priv: introduce macros specifying the number of supported TEF/RX/TX rings")
62713f0d9a38 ("can: mcp251xfd: ring: change order of TX and RX FIFOs")
617283b9c4db ("can: mcp251xfd: ring: prepare to change order of TX and RX FIFOs")
d2d5397fcae1 ("can: mcp251xfd: mcp251xfd_ring_init(): split ring_init into separate functions")
c912f19ee382 ("can: mcp251xfd: introduce struct mcp251xfd_tx_ring::nr and ::fifo_nr and make use of it")
2a68dd8663ea ("can: mcp251xfd: add support for internal PLL")
e39ea1360ca7 ("can: mcp251xfd: mcp251xfd_chip_clock_init(): prepare for PLL support, wait for OSC ready")
a10fd91e42e8 ("can: mcp251xfd: __mcp251xfd_chip_set_mode(): prepare for PLL support: improve error handling and diagnostics")
14193ea2bfee ("can: mcp251xfd: mcp251xfd_chip_timestamp_init(): factor out into separate function")
1ba3690fa2c6 ("can: mcp251xfd: mcp251xfd_chip_sleep(): introduce function to bring chip into sleep mode")
3044a4f271d2 ("can: mcp251xfd: introduce and make use of mcp251xfd_is_fd_mode()")
55bc37c85587 ("can: mcp251xfd: move ring init into separate function")
335c818c5a7a ("can: mcp251xfd: move chip FIFO init into separate file")
1e846c7aeb06 ("can: mcp251xfd: move TEF handling into separate file")
319fdbc9433c ("can: mcp251xfd: move RX handling into separate file")
cae9071bc5ea ("can: mcp251xfd: mcp251xfd.h: sort function prototypes")
99e7cc3b3f85 ("can: mcp251xfd: mcp251xfd_tef_obj_read(): fix typo in error message")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9efa1a5407e81265ea502cab83be4de503decc49 Mon Sep 17 00:00:00 2001
From: Fedor Ross <fedor.ross(a)ifm.com>
Date: Thu, 4 May 2023 21:50:59 +0200
Subject: [PATCH] can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll
timeout
The mcp251xfd controller needs an idle bus to enter 'Normal CAN 2.0
mode' or . The maximum length of a CAN frame is 736 bits (64 data
bytes, CAN-FD, EFF mode, worst case bit stuffing and interframe
spacing). For low bit rates like 10 kbit/s the arbitrarily chosen
MCP251XFD_POLL_TIMEOUT_US of 1 ms is too small.
Otherwise during polling for the CAN controller to enter 'Normal CAN
2.0 mode' the timeout limit is exceeded and the configuration fails
with:
| $ ip link set dev can1 up type can bitrate 10000
| [ 731.911072] mcp251xfd spi2.1 can1: Controller failed to enter mode CAN 2.0 Mode (6) and stays in Configuration Mode (4) (con=0x068b0760, osc=0x00000468).
| [ 731.927192] mcp251xfd spi2.1 can1: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
| [ 731.938101] A link change request failed with some changes committed already. Interface can1 may have been left with an inconsistent configuration, please check.
| RTNETLINK answers: Connection timed out
Make MCP251XFD_POLL_TIMEOUT_US timeout calculation dynamic. Use
maximum of 1ms and bit time of 1 full 64 data bytes CAN-FD frame in
EFF mode, worst case bit stuffing and interframe spacing at the
current bit rate.
For easier backporting define the macro MCP251XFD_FRAME_LEN_MAX_BITS
that holds the max frame length in bits, which is 736. This can be
replaced by can_frame_bits(true, true, true, true, CANFD_MAX_DLEN) in
a cleanup patch later.
Fixes: 55e5b97f003e8 ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Signed-off-by: Fedor Ross <fedor.ross(a)ifm.com>
Signed-off-by: Marek Vasut <marex(a)denx.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20230717-mcp251xfd-fix-increase-poll-timeout-v5…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
index 68df6d4641b5..eebf967f4711 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
@@ -227,6 +227,8 @@ static int
__mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv,
const u8 mode_req, bool nowait)
{
+ const struct can_bittiming *bt = &priv->can.bittiming;
+ unsigned long timeout_us = MCP251XFD_POLL_TIMEOUT_US;
u32 con = 0, con_reqop, osc = 0;
u8 mode;
int err;
@@ -246,12 +248,16 @@ __mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv,
if (mode_req == MCP251XFD_REG_CON_MODE_SLEEP || nowait)
return 0;
+ if (bt->bitrate)
+ timeout_us = max_t(unsigned long, timeout_us,
+ MCP251XFD_FRAME_LEN_MAX_BITS * USEC_PER_SEC /
+ bt->bitrate);
+
err = regmap_read_poll_timeout(priv->map_reg, MCP251XFD_REG_CON, con,
!mcp251xfd_reg_invalid(con) &&
FIELD_GET(MCP251XFD_REG_CON_OPMOD_MASK,
con) == mode_req,
- MCP251XFD_POLL_SLEEP_US,
- MCP251XFD_POLL_TIMEOUT_US);
+ MCP251XFD_POLL_SLEEP_US, timeout_us);
if (err != -ETIMEDOUT && err != -EBADMSG)
return err;
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
index 7024ff0cc2c0..24510b3b8020 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
@@ -387,6 +387,7 @@ static_assert(MCP251XFD_TIMESTAMP_WORK_DELAY_SEC <
#define MCP251XFD_OSC_STAB_TIMEOUT_US (10 * MCP251XFD_OSC_STAB_SLEEP_US)
#define MCP251XFD_POLL_SLEEP_US (10)
#define MCP251XFD_POLL_TIMEOUT_US (USEC_PER_MSEC)
+#define MCP251XFD_FRAME_LEN_MAX_BITS (736)
/* Misc */
#define MCP251XFD_NAPI_WEIGHT 32
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 0657b20c5a76c938612f8409735a8830d257866e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072320-veal-canopy-e2f5@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
0657b20c5a76 ("btrfs: fix use-after-free of new block group that became unused")
a9f189716cf1 ("btrfs: move out now unused BG from the reclaim list")
961f5b8bf48a ("btrfs: convert btrfs_block_group::seq_zone to runtime flag")
0d7764ff58b4 ("btrfs: convert btrfs_block_group::needs_free_space to runtime flag")
3349b57fd47b ("btrfs: convert block group bit field to use bit helpers")
9d4b0a129a0d ("btrfs: simplify arguments of btrfs_update_space_info and rename")
6ca64ac27631 ("btrfs: zoned: fix mounting with conventional zones")
ced8ecf026fd ("btrfs: fix space cache corruption and potential double allocations")
b09315139136 ("btrfs: zoned: activate metadata block group on flush_space")
6a921de58992 ("btrfs: zoned: introduce space_info->active_total_bytes")
393f646e34c1 ("btrfs: zoned: finish least available block group on data bg allocation")
bb9950d3df71 ("btrfs: let can_allocate_chunk return error")
f6fca3917b4d ("btrfs: store chunk size in space-info struct")
b3a3b0255797 ("btrfs: zoned: drop optimization of zone finish")
343d8a30851c ("btrfs: zoned: prevent allocation from previous data relocation BG")
56fbb0a4e8b3 ("btrfs: zoned: properly finish block group on metadata write")
d70cbdda75da ("btrfs: zoned: consolidate zone finish functions")
09022b14fafc ("btrfs: scrub: introduce dedicated helper to scrub simple-mirror based range")
416bd7e7af60 ("btrfs: scrub: introduce a helper to locate an extent item")
16b0c2581e3a ("btrfs: use a read/write lock for protecting the block groups tree")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0657b20c5a76c938612f8409735a8830d257866e Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 28 Jun 2023 17:13:37 +0100
Subject: [PATCH] btrfs: fix use-after-free of new block group that became
unused
If a task creates a new block group and that block group becomes unused
before we finish its creation, at btrfs_create_pending_block_groups(),
then when btrfs_mark_bg_unused() is called against the block group, we
assume that the block group is currently in the list of block groups to
reclaim, and we move it out of the list of new block groups and into the
list of unused block groups. This has two consequences:
1) We move it out of the list of new block groups associated to the
current transaction. So the block group creation is not finished and
if we attempt to delete the bg because it's unused, we will not find
the block group item in the extent tree (or the new block group tree),
its device extent items in the device tree etc, resulting in the
deletion to fail due to the missing items;
2) We don't increment the reference count on the block group when we
move it to the list of unused block groups, because we assumed the
block group was on the list of block groups to reclaim, and in that
case it already has the correct reference count. However the block
group was on the list of new block groups, in which case no extra
reference was taken because it's local to the current task. This
later results in doing an extra reference count decrement when
removing the block group from the unused list, eventually leading the
reference count to 0.
This second case was caught when running generic/297 from fstests, which
produced the following assertion failure and stack trace:
[589.559] assertion failed: refcount_read(&block_group->refs) == 1, in fs/btrfs/block-group.c:4299
[589.559] ------------[ cut here ]------------
[589.559] kernel BUG at fs/btrfs/block-group.c:4299!
[589.560] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[589.560] CPU: 8 PID: 2819134 Comm: umount Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1
[589.560] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[589.560] RIP: 0010:btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.561] Code: 68 62 da c0 (...)
[589.561] RSP: 0018:ffffa55a8c3b3d98 EFLAGS: 00010246
[589.561] RAX: 0000000000000058 RBX: ffff8f030d7f2000 RCX: 0000000000000000
[589.562] RDX: 0000000000000000 RSI: ffffffff953f0878 RDI: 00000000ffffffff
[589.562] RBP: ffff8f030d7f2088 R08: 0000000000000000 R09: ffffa55a8c3b3c50
[589.562] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8f05850b4c00
[589.562] R13: ffff8f030d7f2090 R14: ffff8f05850b4cd8 R15: dead000000000100
[589.563] FS: 00007f497fd2e840(0000) GS:ffff8f09dfc00000(0000) knlGS:0000000000000000
[589.563] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[589.563] CR2: 00007f497ff8ec10 CR3: 0000000271472006 CR4: 0000000000370ee0
[589.563] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[589.564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[589.564] Call Trace:
[589.564] <TASK>
[589.565] ? __die_body+0x1b/0x60
[589.565] ? die+0x39/0x60
[589.565] ? do_trap+0xeb/0x110
[589.565] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.566] ? do_error_trap+0x6a/0x90
[589.566] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.566] ? exc_invalid_op+0x4e/0x70
[589.566] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.567] ? asm_exc_invalid_op+0x16/0x20
[589.567] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.567] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.567] close_ctree+0x35d/0x560 [btrfs]
[589.568] ? fsnotify_sb_delete+0x13e/0x1d0
[589.568] ? dispose_list+0x3a/0x50
[589.568] ? evict_inodes+0x151/0x1a0
[589.568] generic_shutdown_super+0x73/0x1a0
[589.569] kill_anon_super+0x14/0x30
[589.569] btrfs_kill_super+0x12/0x20 [btrfs]
[589.569] deactivate_locked_super+0x2e/0x70
[589.569] cleanup_mnt+0x104/0x160
[589.570] task_work_run+0x56/0x90
[589.570] exit_to_user_mode_prepare+0x160/0x170
[589.570] syscall_exit_to_user_mode+0x22/0x50
[589.570] ? __x64_sys_umount+0x12/0x20
[589.571] do_syscall_64+0x48/0x90
[589.571] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[589.571] RIP: 0033:0x7f497ff0a567
[589.571] Code: af 98 0e (...)
[589.572] RSP: 002b:00007ffc98347358 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[589.572] RAX: 0000000000000000 RBX: 00007f49800b8264 RCX: 00007f497ff0a567
[589.572] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000557f558abfa0
[589.573] RBP: 0000557f558a6ba0 R08: 0000000000000000 R09: 00007ffc98346100
[589.573] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[589.573] R13: 0000557f558abfa0 R14: 0000557f558a6cb0 R15: 0000557f558a6dd0
[589.573] </TASK>
[589.574] Modules linked in: dm_snapshot dm_thin_pool (...)
[589.576] ---[ end trace 0000000000000000 ]---
Fix this by adding a runtime flag to the block group to tell that the
block group is still in the list of new block groups, and therefore it
should not be moved to the list of unused block groups, at
btrfs_mark_bg_unused(), until the flag is cleared, when we finish the
creation of the block group at btrfs_create_pending_block_groups().
Fixes: a9f189716cf1 ("btrfs: move out now unused BG from the reclaim list")
CC: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 6753524b146c..f53297726238 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1640,13 +1640,14 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
{
struct btrfs_fs_info *fs_info = bg->fs_info;
- trace_btrfs_add_unused_block_group(bg);
spin_lock(&fs_info->unused_bgs_lock);
if (list_empty(&bg->bg_list)) {
btrfs_get_block_group(bg);
+ trace_btrfs_add_unused_block_group(bg);
list_add_tail(&bg->bg_list, &fs_info->unused_bgs);
- } else {
+ } else if (!test_bit(BLOCK_GROUP_FLAG_NEW, &bg->runtime_flags)) {
/* Pull out the block group from the reclaim_bgs list. */
+ trace_btrfs_add_unused_block_group(bg);
list_move_tail(&bg->bg_list, &fs_info->unused_bgs);
}
spin_unlock(&fs_info->unused_bgs_lock);
@@ -2668,6 +2669,7 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
next:
btrfs_delayed_refs_rsv_release(fs_info, 1);
list_del_init(&block_group->bg_list);
+ clear_bit(BLOCK_GROUP_FLAG_NEW, &block_group->runtime_flags);
}
btrfs_trans_release_chunk_metadata(trans);
}
@@ -2707,6 +2709,13 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran
if (!cache)
return ERR_PTR(-ENOMEM);
+ /*
+ * Mark it as new before adding it to the rbtree of block groups or any
+ * list, so that no other task finds it and calls btrfs_mark_bg_unused()
+ * before the new flag is set.
+ */
+ set_bit(BLOCK_GROUP_FLAG_NEW, &cache->runtime_flags);
+
cache->length = size;
set_free_space_tree_thresholds(cache);
cache->flags = type;
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index f204addc3fe8..381c54a56417 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -70,6 +70,11 @@ enum btrfs_block_group_flags {
BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE,
/* Indicate that the block group is placed on a sequential zone */
BLOCK_GROUP_FLAG_SEQUENTIAL_ZONE,
+ /*
+ * Indicate that block group is in the list of new block groups of a
+ * transaction.
+ */
+ BLOCK_GROUP_FLAG_NEW,
};
enum btrfs_caching_type {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 0657b20c5a76c938612f8409735a8830d257866e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072317-grandkid-uncommon-ca27@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
0657b20c5a76 ("btrfs: fix use-after-free of new block group that became unused")
a9f189716cf1 ("btrfs: move out now unused BG from the reclaim list")
961f5b8bf48a ("btrfs: convert btrfs_block_group::seq_zone to runtime flag")
0d7764ff58b4 ("btrfs: convert btrfs_block_group::needs_free_space to runtime flag")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0657b20c5a76c938612f8409735a8830d257866e Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 28 Jun 2023 17:13:37 +0100
Subject: [PATCH] btrfs: fix use-after-free of new block group that became
unused
If a task creates a new block group and that block group becomes unused
before we finish its creation, at btrfs_create_pending_block_groups(),
then when btrfs_mark_bg_unused() is called against the block group, we
assume that the block group is currently in the list of block groups to
reclaim, and we move it out of the list of new block groups and into the
list of unused block groups. This has two consequences:
1) We move it out of the list of new block groups associated to the
current transaction. So the block group creation is not finished and
if we attempt to delete the bg because it's unused, we will not find
the block group item in the extent tree (or the new block group tree),
its device extent items in the device tree etc, resulting in the
deletion to fail due to the missing items;
2) We don't increment the reference count on the block group when we
move it to the list of unused block groups, because we assumed the
block group was on the list of block groups to reclaim, and in that
case it already has the correct reference count. However the block
group was on the list of new block groups, in which case no extra
reference was taken because it's local to the current task. This
later results in doing an extra reference count decrement when
removing the block group from the unused list, eventually leading the
reference count to 0.
This second case was caught when running generic/297 from fstests, which
produced the following assertion failure and stack trace:
[589.559] assertion failed: refcount_read(&block_group->refs) == 1, in fs/btrfs/block-group.c:4299
[589.559] ------------[ cut here ]------------
[589.559] kernel BUG at fs/btrfs/block-group.c:4299!
[589.560] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[589.560] CPU: 8 PID: 2819134 Comm: umount Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1
[589.560] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[589.560] RIP: 0010:btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.561] Code: 68 62 da c0 (...)
[589.561] RSP: 0018:ffffa55a8c3b3d98 EFLAGS: 00010246
[589.561] RAX: 0000000000000058 RBX: ffff8f030d7f2000 RCX: 0000000000000000
[589.562] RDX: 0000000000000000 RSI: ffffffff953f0878 RDI: 00000000ffffffff
[589.562] RBP: ffff8f030d7f2088 R08: 0000000000000000 R09: ffffa55a8c3b3c50
[589.562] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8f05850b4c00
[589.562] R13: ffff8f030d7f2090 R14: ffff8f05850b4cd8 R15: dead000000000100
[589.563] FS: 00007f497fd2e840(0000) GS:ffff8f09dfc00000(0000) knlGS:0000000000000000
[589.563] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[589.563] CR2: 00007f497ff8ec10 CR3: 0000000271472006 CR4: 0000000000370ee0
[589.563] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[589.564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[589.564] Call Trace:
[589.564] <TASK>
[589.565] ? __die_body+0x1b/0x60
[589.565] ? die+0x39/0x60
[589.565] ? do_trap+0xeb/0x110
[589.565] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.566] ? do_error_trap+0x6a/0x90
[589.566] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.566] ? exc_invalid_op+0x4e/0x70
[589.566] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.567] ? asm_exc_invalid_op+0x16/0x20
[589.567] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.567] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.567] close_ctree+0x35d/0x560 [btrfs]
[589.568] ? fsnotify_sb_delete+0x13e/0x1d0
[589.568] ? dispose_list+0x3a/0x50
[589.568] ? evict_inodes+0x151/0x1a0
[589.568] generic_shutdown_super+0x73/0x1a0
[589.569] kill_anon_super+0x14/0x30
[589.569] btrfs_kill_super+0x12/0x20 [btrfs]
[589.569] deactivate_locked_super+0x2e/0x70
[589.569] cleanup_mnt+0x104/0x160
[589.570] task_work_run+0x56/0x90
[589.570] exit_to_user_mode_prepare+0x160/0x170
[589.570] syscall_exit_to_user_mode+0x22/0x50
[589.570] ? __x64_sys_umount+0x12/0x20
[589.571] do_syscall_64+0x48/0x90
[589.571] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[589.571] RIP: 0033:0x7f497ff0a567
[589.571] Code: af 98 0e (...)
[589.572] RSP: 002b:00007ffc98347358 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[589.572] RAX: 0000000000000000 RBX: 00007f49800b8264 RCX: 00007f497ff0a567
[589.572] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000557f558abfa0
[589.573] RBP: 0000557f558a6ba0 R08: 0000000000000000 R09: 00007ffc98346100
[589.573] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[589.573] R13: 0000557f558abfa0 R14: 0000557f558a6cb0 R15: 0000557f558a6dd0
[589.573] </TASK>
[589.574] Modules linked in: dm_snapshot dm_thin_pool (...)
[589.576] ---[ end trace 0000000000000000 ]---
Fix this by adding a runtime flag to the block group to tell that the
block group is still in the list of new block groups, and therefore it
should not be moved to the list of unused block groups, at
btrfs_mark_bg_unused(), until the flag is cleared, when we finish the
creation of the block group at btrfs_create_pending_block_groups().
Fixes: a9f189716cf1 ("btrfs: move out now unused BG from the reclaim list")
CC: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 6753524b146c..f53297726238 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1640,13 +1640,14 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
{
struct btrfs_fs_info *fs_info = bg->fs_info;
- trace_btrfs_add_unused_block_group(bg);
spin_lock(&fs_info->unused_bgs_lock);
if (list_empty(&bg->bg_list)) {
btrfs_get_block_group(bg);
+ trace_btrfs_add_unused_block_group(bg);
list_add_tail(&bg->bg_list, &fs_info->unused_bgs);
- } else {
+ } else if (!test_bit(BLOCK_GROUP_FLAG_NEW, &bg->runtime_flags)) {
/* Pull out the block group from the reclaim_bgs list. */
+ trace_btrfs_add_unused_block_group(bg);
list_move_tail(&bg->bg_list, &fs_info->unused_bgs);
}
spin_unlock(&fs_info->unused_bgs_lock);
@@ -2668,6 +2669,7 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
next:
btrfs_delayed_refs_rsv_release(fs_info, 1);
list_del_init(&block_group->bg_list);
+ clear_bit(BLOCK_GROUP_FLAG_NEW, &block_group->runtime_flags);
}
btrfs_trans_release_chunk_metadata(trans);
}
@@ -2707,6 +2709,13 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran
if (!cache)
return ERR_PTR(-ENOMEM);
+ /*
+ * Mark it as new before adding it to the rbtree of block groups or any
+ * list, so that no other task finds it and calls btrfs_mark_bg_unused()
+ * before the new flag is set.
+ */
+ set_bit(BLOCK_GROUP_FLAG_NEW, &cache->runtime_flags);
+
cache->length = size;
set_free_space_tree_thresholds(cache);
cache->flags = type;
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index f204addc3fe8..381c54a56417 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -70,6 +70,11 @@ enum btrfs_block_group_flags {
BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE,
/* Indicate that the block group is placed on a sequential zone */
BLOCK_GROUP_FLAG_SEQUENTIAL_ZONE,
+ /*
+ * Indicate that block group is in the list of new block groups of a
+ * transaction.
+ */
+ BLOCK_GROUP_FLAG_NEW,
};
enum btrfs_caching_type {
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 0657b20c5a76c938612f8409735a8830d257866e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072313-onshore-immunize-bd55@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0657b20c5a76c938612f8409735a8830d257866e Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 28 Jun 2023 17:13:37 +0100
Subject: [PATCH] btrfs: fix use-after-free of new block group that became
unused
If a task creates a new block group and that block group becomes unused
before we finish its creation, at btrfs_create_pending_block_groups(),
then when btrfs_mark_bg_unused() is called against the block group, we
assume that the block group is currently in the list of block groups to
reclaim, and we move it out of the list of new block groups and into the
list of unused block groups. This has two consequences:
1) We move it out of the list of new block groups associated to the
current transaction. So the block group creation is not finished and
if we attempt to delete the bg because it's unused, we will not find
the block group item in the extent tree (or the new block group tree),
its device extent items in the device tree etc, resulting in the
deletion to fail due to the missing items;
2) We don't increment the reference count on the block group when we
move it to the list of unused block groups, because we assumed the
block group was on the list of block groups to reclaim, and in that
case it already has the correct reference count. However the block
group was on the list of new block groups, in which case no extra
reference was taken because it's local to the current task. This
later results in doing an extra reference count decrement when
removing the block group from the unused list, eventually leading the
reference count to 0.
This second case was caught when running generic/297 from fstests, which
produced the following assertion failure and stack trace:
[589.559] assertion failed: refcount_read(&block_group->refs) == 1, in fs/btrfs/block-group.c:4299
[589.559] ------------[ cut here ]------------
[589.559] kernel BUG at fs/btrfs/block-group.c:4299!
[589.560] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[589.560] CPU: 8 PID: 2819134 Comm: umount Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1
[589.560] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[589.560] RIP: 0010:btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.561] Code: 68 62 da c0 (...)
[589.561] RSP: 0018:ffffa55a8c3b3d98 EFLAGS: 00010246
[589.561] RAX: 0000000000000058 RBX: ffff8f030d7f2000 RCX: 0000000000000000
[589.562] RDX: 0000000000000000 RSI: ffffffff953f0878 RDI: 00000000ffffffff
[589.562] RBP: ffff8f030d7f2088 R08: 0000000000000000 R09: ffffa55a8c3b3c50
[589.562] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8f05850b4c00
[589.562] R13: ffff8f030d7f2090 R14: ffff8f05850b4cd8 R15: dead000000000100
[589.563] FS: 00007f497fd2e840(0000) GS:ffff8f09dfc00000(0000) knlGS:0000000000000000
[589.563] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[589.563] CR2: 00007f497ff8ec10 CR3: 0000000271472006 CR4: 0000000000370ee0
[589.563] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[589.564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[589.564] Call Trace:
[589.564] <TASK>
[589.565] ? __die_body+0x1b/0x60
[589.565] ? die+0x39/0x60
[589.565] ? do_trap+0xeb/0x110
[589.565] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.566] ? do_error_trap+0x6a/0x90
[589.566] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.566] ? exc_invalid_op+0x4e/0x70
[589.566] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.567] ? asm_exc_invalid_op+0x16/0x20
[589.567] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.567] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.567] close_ctree+0x35d/0x560 [btrfs]
[589.568] ? fsnotify_sb_delete+0x13e/0x1d0
[589.568] ? dispose_list+0x3a/0x50
[589.568] ? evict_inodes+0x151/0x1a0
[589.568] generic_shutdown_super+0x73/0x1a0
[589.569] kill_anon_super+0x14/0x30
[589.569] btrfs_kill_super+0x12/0x20 [btrfs]
[589.569] deactivate_locked_super+0x2e/0x70
[589.569] cleanup_mnt+0x104/0x160
[589.570] task_work_run+0x56/0x90
[589.570] exit_to_user_mode_prepare+0x160/0x170
[589.570] syscall_exit_to_user_mode+0x22/0x50
[589.570] ? __x64_sys_umount+0x12/0x20
[589.571] do_syscall_64+0x48/0x90
[589.571] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[589.571] RIP: 0033:0x7f497ff0a567
[589.571] Code: af 98 0e (...)
[589.572] RSP: 002b:00007ffc98347358 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[589.572] RAX: 0000000000000000 RBX: 00007f49800b8264 RCX: 00007f497ff0a567
[589.572] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000557f558abfa0
[589.573] RBP: 0000557f558a6ba0 R08: 0000000000000000 R09: 00007ffc98346100
[589.573] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[589.573] R13: 0000557f558abfa0 R14: 0000557f558a6cb0 R15: 0000557f558a6dd0
[589.573] </TASK>
[589.574] Modules linked in: dm_snapshot dm_thin_pool (...)
[589.576] ---[ end trace 0000000000000000 ]---
Fix this by adding a runtime flag to the block group to tell that the
block group is still in the list of new block groups, and therefore it
should not be moved to the list of unused block groups, at
btrfs_mark_bg_unused(), until the flag is cleared, when we finish the
creation of the block group at btrfs_create_pending_block_groups().
Fixes: a9f189716cf1 ("btrfs: move out now unused BG from the reclaim list")
CC: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 6753524b146c..f53297726238 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1640,13 +1640,14 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
{
struct btrfs_fs_info *fs_info = bg->fs_info;
- trace_btrfs_add_unused_block_group(bg);
spin_lock(&fs_info->unused_bgs_lock);
if (list_empty(&bg->bg_list)) {
btrfs_get_block_group(bg);
+ trace_btrfs_add_unused_block_group(bg);
list_add_tail(&bg->bg_list, &fs_info->unused_bgs);
- } else {
+ } else if (!test_bit(BLOCK_GROUP_FLAG_NEW, &bg->runtime_flags)) {
/* Pull out the block group from the reclaim_bgs list. */
+ trace_btrfs_add_unused_block_group(bg);
list_move_tail(&bg->bg_list, &fs_info->unused_bgs);
}
spin_unlock(&fs_info->unused_bgs_lock);
@@ -2668,6 +2669,7 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
next:
btrfs_delayed_refs_rsv_release(fs_info, 1);
list_del_init(&block_group->bg_list);
+ clear_bit(BLOCK_GROUP_FLAG_NEW, &block_group->runtime_flags);
}
btrfs_trans_release_chunk_metadata(trans);
}
@@ -2707,6 +2709,13 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran
if (!cache)
return ERR_PTR(-ENOMEM);
+ /*
+ * Mark it as new before adding it to the rbtree of block groups or any
+ * list, so that no other task finds it and calls btrfs_mark_bg_unused()
+ * before the new flag is set.
+ */
+ set_bit(BLOCK_GROUP_FLAG_NEW, &cache->runtime_flags);
+
cache->length = size;
set_free_space_tree_thresholds(cache);
cache->flags = type;
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index f204addc3fe8..381c54a56417 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -70,6 +70,11 @@ enum btrfs_block_group_flags {
BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE,
/* Indicate that the block group is placed on a sequential zone */
BLOCK_GROUP_FLAG_SEQUENTIAL_ZONE,
+ /*
+ * Indicate that block group is in the list of new block groups of a
+ * transaction.
+ */
+ BLOCK_GROUP_FLAG_NEW,
};
enum btrfs_caching_type {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 7a93c71a6714ca1a9c03d70432dac104b0cfb815
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072304-prepay-unread-75ce@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
7a93c71a6714 ("maple_tree: fix 32 bit mas_next testing")
eb2e817f38ca ("maple_tree: update testing code for mas_{next,prev,walk}")
eaf9790d3bc6 ("maple_tree: add __init and __exit to test module")
4bd6dded6318 ("test_maple_tree: add more testing for mas_empty_area()")
5159d64b3354 ("test_maple_tree: test modifications while iterating")
7327e8111adb ("maple_tree: fix mas_empty_area_rev() lower bound validation")
c5651b31f515 ("test_maple_tree: add test for mas_spanning_rebalance() on insufficient data")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7a93c71a6714ca1a9c03d70432dac104b0cfb815 Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Wed, 12 Jul 2023 13:39:15 -0400
Subject: [PATCH] maple_tree: fix 32 bit mas_next testing
The test setup of mas_next is dependent on node entry size to create a 2
level tree, but the tests did not account for this in the expected value
when shifting beyond the scope of the tree.
Fix this by setting up the test to succeed depending on the node entries
which is dependent on the 32/64 bit setup.
Link: https://lkml.kernel.org/r/20230712173916.168805-1-Liam.Howlett@oracle.com
Fixes: 120b116208a0 ("maple_tree: reorganize testing to restore module testing")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Closes: https://lore.kernel.org/linux-mm/CAMuHMdV4T53fOw7VPoBgPR7fP6RYqf=CBhD_y_vOg…
Tested-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/test_maple_tree.c b/lib/test_maple_tree.c
index 9939be34e516..8d4c92cbdd0c 100644
--- a/lib/test_maple_tree.c
+++ b/lib/test_maple_tree.c
@@ -1898,13 +1898,16 @@ static noinline void __init next_prev_test(struct maple_tree *mt)
725};
static const unsigned long level2_32[] = { 1747, 2000, 1750, 1755,
1760, 1765};
+ unsigned long last_index;
if (MAPLE_32BIT) {
nr_entries = 500;
level2 = level2_32;
+ last_index = 0x138e;
} else {
nr_entries = 200;
level2 = level2_64;
+ last_index = 0x7d6;
}
for (i = 0; i <= nr_entries; i++)
@@ -2011,7 +2014,7 @@ static noinline void __init next_prev_test(struct maple_tree *mt)
val = mas_next(&mas, ULONG_MAX);
MT_BUG_ON(mt, val != NULL);
- MT_BUG_ON(mt, mas.index != 0x7d6);
+ MT_BUG_ON(mt, mas.index != last_index);
MT_BUG_ON(mt, mas.last != ULONG_MAX);
val = mas_prev(&mas, 0);
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 7a93c71a6714ca1a9c03d70432dac104b0cfb815
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072303-possible-backwash-4c0f@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7a93c71a6714ca1a9c03d70432dac104b0cfb815 Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Wed, 12 Jul 2023 13:39:15 -0400
Subject: [PATCH] maple_tree: fix 32 bit mas_next testing
The test setup of mas_next is dependent on node entry size to create a 2
level tree, but the tests did not account for this in the expected value
when shifting beyond the scope of the tree.
Fix this by setting up the test to succeed depending on the node entries
which is dependent on the 32/64 bit setup.
Link: https://lkml.kernel.org/r/20230712173916.168805-1-Liam.Howlett@oracle.com
Fixes: 120b116208a0 ("maple_tree: reorganize testing to restore module testing")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Closes: https://lore.kernel.org/linux-mm/CAMuHMdV4T53fOw7VPoBgPR7fP6RYqf=CBhD_y_vOg…
Tested-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/test_maple_tree.c b/lib/test_maple_tree.c
index 9939be34e516..8d4c92cbdd0c 100644
--- a/lib/test_maple_tree.c
+++ b/lib/test_maple_tree.c
@@ -1898,13 +1898,16 @@ static noinline void __init next_prev_test(struct maple_tree *mt)
725};
static const unsigned long level2_32[] = { 1747, 2000, 1750, 1755,
1760, 1765};
+ unsigned long last_index;
if (MAPLE_32BIT) {
nr_entries = 500;
level2 = level2_32;
+ last_index = 0x138e;
} else {
nr_entries = 200;
level2 = level2_64;
+ last_index = 0x7d6;
}
for (i = 0; i <= nr_entries; i++)
@@ -2011,7 +2014,7 @@ static noinline void __init next_prev_test(struct maple_tree *mt)
val = mas_next(&mas, ULONG_MAX);
MT_BUG_ON(mt, val != NULL);
- MT_BUG_ON(mt, mas.index != 0x7d6);
+ MT_BUG_ON(mt, mas.index != last_index);
MT_BUG_ON(mt, mas.last != ULONG_MAX);
val = mas_prev(&mas, 0);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x a9be202269580ca611c6cebac90eaf1795497800
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072352-book-slab-df31@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
a9be20226958 ("io_uring: treat -EAGAIN for REQ_F_NOWAIT as final for io-wq")
ed29b0b4fd83 ("io_uring: move to separate directory")
d01905db14eb ("io_uring: clean iowq submit work cancellation")
255657d23704 ("io_uring: clean io_wq_submit_work()'s main loop")
90fa02883f06 ("io_uring: implement async hybrid mode for pollable requests")
3b44b3712c5b ("io_uring: split logic of force_nonblock")
9882131cd9de ("io_uring: kill io_wq_current_is_worker() in iopoll")
9983028e7660 ("io_uring: optimise req->ctx reloads")
5e49c973fc39 ("io_uring: clean up io_import_iovec")
51aac424aef9 ("io_uring: optimise io_import_iovec nonblock passing")
c88598a92a58 ("io_uring: optimise read/write iov state storing")
538941e2681c ("io_uring: encapsulate rw state")
d886e185a128 ("io_uring: control ->async_data with a REQ_F flag")
30d51dd4ad20 ("io_uring: clean up buffer select")
ef05d9ebcc92 ("io_uring: kill off ->inflight_entry field")
6f33b0bc4ea4 ("io_uring: use slist for completion batching")
c450178d9be9 ("io_uring: dedup CQE flushing non-empty checks")
4c928904ff77 ("block: move CONFIG_BLOCK guard to top Makefile")
14cfbb7a7856 ("io_uring: fix wrong condition to grab uring lock")
7df778be2f61 ("io_uring: make OP_CLOSE consistent with direct open")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a9be202269580ca611c6cebac90eaf1795497800 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Thu, 20 Jul 2023 13:16:53 -0600
Subject: [PATCH] io_uring: treat -EAGAIN for REQ_F_NOWAIT as final for io-wq
io-wq assumes that an issue is blocking, but it may not be if the
request type has asked for a non-blocking attempt. If we get
-EAGAIN for that case, then we need to treat it as a final result
and not retry or arm poll for it.
Cc: stable(a)vger.kernel.org # 5.10+
Link: https://github.com/axboe/liburing/issues/897
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index a9923676d16d..5e97235a82d6 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1948,6 +1948,14 @@ void io_wq_submit_work(struct io_wq_work *work)
ret = io_issue_sqe(req, issue_flags);
if (ret != -EAGAIN)
break;
+
+ /*
+ * If REQ_F_NOWAIT is set, then don't wait or retry with
+ * poll. -EAGAIN is final for that case.
+ */
+ if (req->flags & REQ_F_NOWAIT)
+ break;
+
/*
* We can get EAGAIN for iopolled IO even though we're
* forcing a sync submission from here, since we can't
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 69ea4c9d02b7947cdd612335a61cc1a02e544ccd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072308-sandbox-blemish-85a3@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
69ea4c9d02b7 ("ALSA: hda/realtek - remove 3k pull low procedure")
f30741cded62 ("ALSA: hda/realtek: Fix audio regression on Mi Notebook Pro 2020")
5aec98913095 ("ALSA: hda/realtek - ALC236 headset MIC recording issue")
92666d45adcf ("ALSA: hda/realtek - Fixed Dell AIO wrong sound tone")
9e885770277d ("ALSA: hda/realtek - HP Headset Mic can't detect after boot")
a0ccbc5319d5 ("ALSA: hda/realtek - Add supported mute Led for HP")
446b8185f0c3 ("ALSA: hda/realtek - Add supported for Lenovo ThinkPad Headset Button")
ef9ce66fab95 ("ALSA: hda/realtek - Enable headphone for ASUS TM420")
8a8de09cb2ad ("ALSA: hda/realtek - Fixed HP headset Mic can't be detected")
08befca40026 ("ALSA: hda/realtek - Add mute Led support for HP Elitebook 845 G7")
13468bfa8c58 ("ALSA: hda/realtek - set mic to auto detect on a HP AIO machine")
3f7424905782 ("ALSA: hda/realtek - Couldn't detect Mic if booting with headset plugged")
fc19d559b0d3 ("ALSA: hda/realtek - The Mic on a RedmiBook doesn't work")
23dc95868944 ("ALSA: hda/realtek: Add model alc298-samsung-headphone")
e2d2fded6bdf ("ALSA: hda/realtek: Fix pin default on Intel NUC 8 Rugged")
3b5d1afd1f13 ("Merge branch 'for-next' into for-linus")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 69ea4c9d02b7947cdd612335a61cc1a02e544ccd Mon Sep 17 00:00:00 2001
From: Kailang Yang <kailang(a)realtek.com>
Date: Thu, 13 Jul 2023 15:57:13 +0800
Subject: [PATCH] ALSA: hda/realtek - remove 3k pull low procedure
This was the ALC283 depop procedure.
Maybe this procedure wasn't suitable with new codec.
So, let us remove it. But HP 15z-fc000 must do 3k pull low. If it
reboot with plugged headset,
it will have errors show don't find codec error messages. Run 3k pull
low will solve issues.
So, let AMD chipset will run this for workarround.
Fixes: 5aec98913095 ("ALSA: hda/realtek - ALC236 headset MIC recording issue")
Signed-off-by: Kailang Yang <kailang(a)realtek.com>
Cc: <stable(a)vger.kernel.org>
Reported-by: Joseph C. Sible <josephcsible(a)gmail.com>
Closes: https://lore.kernel.org/r/CABpewhE4REgn9RJZduuEU6Z_ijXNeQWnrxO1tg70Gkw=F8qN…
Link: https://lore.kernel.org/r/4678992299664babac4403d9978e7ba7@realtek.com
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index e2f8b608de82..afb4d82475b4 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -122,6 +122,7 @@ struct alc_spec {
unsigned int ultra_low_power:1;
unsigned int has_hs_key:1;
unsigned int no_internal_mic_pin:1;
+ unsigned int en_3kpull_low:1;
/* for PLL fix */
hda_nid_t pll_nid;
@@ -3622,6 +3623,7 @@ static void alc256_shutup(struct hda_codec *codec)
if (!hp_pin)
hp_pin = 0x21;
+ alc_update_coefex_idx(codec, 0x57, 0x04, 0x0007, 0x1); /* Low power */
hp_pin_sense = snd_hda_jack_detect(codec, hp_pin);
if (hp_pin_sense)
@@ -3638,8 +3640,7 @@ static void alc256_shutup(struct hda_codec *codec)
/* If disable 3k pulldown control for alc257, the Mic detection will not work correctly
* when booting with headset plugged. So skip setting it for the codec alc257
*/
- if (codec->core.vendor_id != 0x10ec0236 &&
- codec->core.vendor_id != 0x10ec0257)
+ if (spec->en_3kpull_low)
alc_update_coef_idx(codec, 0x46, 0, 3 << 12);
if (!spec->no_shutup_pins)
@@ -10682,6 +10683,8 @@ static int patch_alc269(struct hda_codec *codec)
spec->shutup = alc256_shutup;
spec->init_hook = alc256_init;
spec->gen.mixer_nid = 0; /* ALC256 does not have any loopback mixer path */
+ if (codec->bus->pci->vendor == PCI_VENDOR_ID_AMD)
+ spec->en_3kpull_low = true;
break;
case 0x10ec0257:
spec->codec_variant = ALC269_TYPE_ALC257;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 69ea4c9d02b7947cdd612335a61cc1a02e544ccd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072307-usage-safely-2a32@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
69ea4c9d02b7 ("ALSA: hda/realtek - remove 3k pull low procedure")
f30741cded62 ("ALSA: hda/realtek: Fix audio regression on Mi Notebook Pro 2020")
5aec98913095 ("ALSA: hda/realtek - ALC236 headset MIC recording issue")
92666d45adcf ("ALSA: hda/realtek - Fixed Dell AIO wrong sound tone")
9e885770277d ("ALSA: hda/realtek - HP Headset Mic can't detect after boot")
a0ccbc5319d5 ("ALSA: hda/realtek - Add supported mute Led for HP")
446b8185f0c3 ("ALSA: hda/realtek - Add supported for Lenovo ThinkPad Headset Button")
ef9ce66fab95 ("ALSA: hda/realtek - Enable headphone for ASUS TM420")
8a8de09cb2ad ("ALSA: hda/realtek - Fixed HP headset Mic can't be detected")
08befca40026 ("ALSA: hda/realtek - Add mute Led support for HP Elitebook 845 G7")
13468bfa8c58 ("ALSA: hda/realtek - set mic to auto detect on a HP AIO machine")
3f7424905782 ("ALSA: hda/realtek - Couldn't detect Mic if booting with headset plugged")
fc19d559b0d3 ("ALSA: hda/realtek - The Mic on a RedmiBook doesn't work")
23dc95868944 ("ALSA: hda/realtek: Add model alc298-samsung-headphone")
e2d2fded6bdf ("ALSA: hda/realtek: Fix pin default on Intel NUC 8 Rugged")
3b5d1afd1f13 ("Merge branch 'for-next' into for-linus")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 69ea4c9d02b7947cdd612335a61cc1a02e544ccd Mon Sep 17 00:00:00 2001
From: Kailang Yang <kailang(a)realtek.com>
Date: Thu, 13 Jul 2023 15:57:13 +0800
Subject: [PATCH] ALSA: hda/realtek - remove 3k pull low procedure
This was the ALC283 depop procedure.
Maybe this procedure wasn't suitable with new codec.
So, let us remove it. But HP 15z-fc000 must do 3k pull low. If it
reboot with plugged headset,
it will have errors show don't find codec error messages. Run 3k pull
low will solve issues.
So, let AMD chipset will run this for workarround.
Fixes: 5aec98913095 ("ALSA: hda/realtek - ALC236 headset MIC recording issue")
Signed-off-by: Kailang Yang <kailang(a)realtek.com>
Cc: <stable(a)vger.kernel.org>
Reported-by: Joseph C. Sible <josephcsible(a)gmail.com>
Closes: https://lore.kernel.org/r/CABpewhE4REgn9RJZduuEU6Z_ijXNeQWnrxO1tg70Gkw=F8qN…
Link: https://lore.kernel.org/r/4678992299664babac4403d9978e7ba7@realtek.com
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index e2f8b608de82..afb4d82475b4 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -122,6 +122,7 @@ struct alc_spec {
unsigned int ultra_low_power:1;
unsigned int has_hs_key:1;
unsigned int no_internal_mic_pin:1;
+ unsigned int en_3kpull_low:1;
/* for PLL fix */
hda_nid_t pll_nid;
@@ -3622,6 +3623,7 @@ static void alc256_shutup(struct hda_codec *codec)
if (!hp_pin)
hp_pin = 0x21;
+ alc_update_coefex_idx(codec, 0x57, 0x04, 0x0007, 0x1); /* Low power */
hp_pin_sense = snd_hda_jack_detect(codec, hp_pin);
if (hp_pin_sense)
@@ -3638,8 +3640,7 @@ static void alc256_shutup(struct hda_codec *codec)
/* If disable 3k pulldown control for alc257, the Mic detection will not work correctly
* when booting with headset plugged. So skip setting it for the codec alc257
*/
- if (codec->core.vendor_id != 0x10ec0236 &&
- codec->core.vendor_id != 0x10ec0257)
+ if (spec->en_3kpull_low)
alc_update_coef_idx(codec, 0x46, 0, 3 << 12);
if (!spec->no_shutup_pins)
@@ -10682,6 +10683,8 @@ static int patch_alc269(struct hda_codec *codec)
spec->shutup = alc256_shutup;
spec->init_hook = alc256_init;
spec->gen.mixer_nid = 0; /* ALC256 does not have any loopback mixer path */
+ if (codec->bus->pci->vendor == PCI_VENDOR_ID_AMD)
+ spec->en_3kpull_low = true;
break;
case 0x10ec0257:
spec->codec_variant = ALC269_TYPE_ALC257;
Unlike Intel's Enhanced IBRS feature, AMD's Automatic IBRS does not
provide protection to processes running at CPL3/user mode [1].
Explicitly enable STIBP to protect against cross-thread CPL3
branch target injections on systems with Automatic IBRS enabled.
Also update the relevant documentation.
The first version of the original AutoIBRS patchseries enabled STIBP
always-on, but it got dropped by mistake in v2 and on.
[1] "AMD64 Architecture Programmer's Manual Volume 2: System Programming",
Pub. 24593, rev. 3.41, June 2023, Part 1, Section 3.1.7 "Extended
Feature Enable Register (EFER)" - accessible via Link.
Reported-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Fixes: e7862eda309e ("x86/cpu: Support AMD Automatic IBRS")
Link: https://bugzilla.kernel.org/attachment.cgi?id=304652
Signed-off-by: Kim Phillips <kim.phillips(a)amd.com>
Cc: Borislav Petkov (AMD) <bp(a)alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Joao Martins <joao.m.martins(a)oracle.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: David Woodhouse <dwmw(a)amazon.co.uk>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Alexey Kardashevskiy <aik(a)amd.com>
Cc: kvm(a)vger.kernel.org
Cc: linux-doc(a)vger.kernel.org
Cc: x86(a)kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
---
Documentation/admin-guide/hw-vuln/spectre.rst | 11 +++++++----
arch/x86/kernel/cpu/bugs.c | 15 +++++++++------
2 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 4d186f599d90..32a8893e5617 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -484,11 +484,14 @@ Spectre variant 2
Systems which support enhanced IBRS (eIBRS) enable IBRS protection once at
boot, by setting the IBRS bit, and they're automatically protected against
- Spectre v2 variant attacks, including cross-thread branch target injections
- on SMT systems (STIBP). In other words, eIBRS enables STIBP too.
+ Spectre v2 variant attacks.
- Legacy IBRS systems clear the IBRS bit on exit to userspace and
- therefore explicitly enable STIBP for that
+ On Intel's enhanced IBRS systems, this includes cross-thread branch target
+ injections on SMT systems (STIBP). In other words, Intel eIBRS enables
+ STIBP, too.
+
+ AMD Automatic IBRS does not protect userspace, and Legacy IBRS systems clear
+ the IBRS bit on exit to userspace, therefore both explicitly enable STIBP.
The retpoline mitigation is turned on by default on vulnerable
CPUs. It can be forced on or off by the administrator
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 9e2a91830f72..95507448e781 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1150,19 +1150,21 @@ spectre_v2_user_select_mitigation(void)
}
/*
- * If no STIBP, enhanced IBRS is enabled, or SMT impossible, STIBP
+ * If no STIBP, Intel enhanced IBRS is enabled, or SMT impossible, STIBP
* is not required.
*
- * Enhanced IBRS also protects against cross-thread branch target
+ * Intel's Enhanced IBRS also protects against cross-thread branch target
* injection in user-mode as the IBRS bit remains always set which
* implicitly enables cross-thread protections. However, in legacy IBRS
* mode, the IBRS bit is set only on kernel entry and cleared on return
- * to userspace. This disables the implicit cross-thread protection,
- * so allow for STIBP to be selected in that case.
+ * to userspace. AMD Automatic IBRS also does not protect userspace.
+ * These modes therefore disable the implicit cross-thread protection,
+ * so allow for STIBP to be selected in those cases.
*/
if (!boot_cpu_has(X86_FEATURE_STIBP) ||
!smt_possible ||
- spectre_v2_in_eibrs_mode(spectre_v2_enabled))
+ (spectre_v2_in_eibrs_mode(spectre_v2_enabled) &&
+ !boot_cpu_has(X86_FEATURE_AUTOIBRS)))
return;
/*
@@ -2294,7 +2296,8 @@ static ssize_t mmio_stale_data_show_state(char *buf)
static char *stibp_state(void)
{
- if (spectre_v2_in_eibrs_mode(spectre_v2_enabled))
+ if (spectre_v2_in_eibrs_mode(spectre_v2_enabled) &&
+ !boot_cpu_has(X86_FEATURE_AUTOIBRS))
return "";
switch (spectre_v2_user_stibp) {
--
2.34.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 6018b585e8c6fa7d85d4b38d9ce49a5b67be7078
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072114-radiantly-gilled-72c0@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if they have referenced variables")
7d18a10c3167 ("tracing: Refactor hist trigger action code")
036876fa5620 ("tracing: Have the historgram use the result of str_has_prefix() for len of prefix")
754481e6954c ("tracing: Use str_has_prefix() helper for histogram code")
de40f033d4e8 ("tracing: Remove open-coding of hist trigger var_ref management")
2f31ed9308cc ("tracing: Change strlen to sizeof for hist trigger static strings")
0e2b81f7b52a ("tracing: Remove unneeded synth_event_mutex")
7bbab38d07f3 ("tracing: Use dyn_event framework for synthetic events")
faacb361f271 ("tracing: Simplify creation and deletion of synthetic events")
fc800a10be26 ("tracing: Lock event_mutex before synth_event_mutex")
343a9f35409b ("Merge tag 'trace-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6018b585e8c6fa7d85d4b38d9ce49a5b67be7078 Mon Sep 17 00:00:00 2001
From: Mohamed Khalfella <mkhalfella(a)purestorage.com>
Date: Wed, 12 Jul 2023 22:30:21 +0000
Subject: [PATCH] tracing/histograms: Add histograms to hist_vars if they have
referenced variables
Hist triggers can have referenced variables without having direct
variables fields. This can be the case if referenced variables are added
for trigger actions. In this case the newly added references will not
have field variables. Not taking such referenced variables into
consideration can result in a bug where it would be possible to remove
hist trigger with variables being refenced. This will result in a bug
that is easily reproducable like so
$ cd /sys/kernel/tracing
$ echo 'synthetic_sys_enter char[] comm; long id' >> synthetic_events
$ echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
$ echo 'hist:keys=common_pid.execname,id.syscall:onmatch(raw_syscalls.sys_enter).synthetic_sys_enter($comm, id)' >> events/raw_syscalls/sys_enter/trigger
$ echo '!hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
[ 100.263533] ==================================================================
[ 100.264634] BUG: KASAN: slab-use-after-free in resolve_var_refs+0xc7/0x180
[ 100.265520] Read of size 8 at addr ffff88810375d0f0 by task bash/439
[ 100.266320]
[ 100.266533] CPU: 2 PID: 439 Comm: bash Not tainted 6.5.0-rc1 #4
[ 100.267277] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[ 100.268561] Call Trace:
[ 100.268902] <TASK>
[ 100.269189] dump_stack_lvl+0x4c/0x70
[ 100.269680] print_report+0xc5/0x600
[ 100.270165] ? resolve_var_refs+0xc7/0x180
[ 100.270697] ? kasan_complete_mode_report_info+0x80/0x1f0
[ 100.271389] ? resolve_var_refs+0xc7/0x180
[ 100.271913] kasan_report+0xbd/0x100
[ 100.272380] ? resolve_var_refs+0xc7/0x180
[ 100.272920] __asan_load8+0x71/0xa0
[ 100.273377] resolve_var_refs+0xc7/0x180
[ 100.273888] event_hist_trigger+0x749/0x860
[ 100.274505] ? kasan_save_stack+0x2a/0x50
[ 100.275024] ? kasan_set_track+0x29/0x40
[ 100.275536] ? __pfx_event_hist_trigger+0x10/0x10
[ 100.276138] ? ksys_write+0xd1/0x170
[ 100.276607] ? do_syscall_64+0x3c/0x90
[ 100.277099] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 100.277771] ? destroy_hist_data+0x446/0x470
[ 100.278324] ? event_hist_trigger_parse+0xa6c/0x3860
[ 100.278962] ? __pfx_event_hist_trigger_parse+0x10/0x10
[ 100.279627] ? __kasan_check_write+0x18/0x20
[ 100.280177] ? mutex_unlock+0x85/0xd0
[ 100.280660] ? __pfx_mutex_unlock+0x10/0x10
[ 100.281200] ? kfree+0x7b/0x120
[ 100.281619] ? ____kasan_slab_free+0x15d/0x1d0
[ 100.282197] ? event_trigger_write+0xac/0x100
[ 100.282764] ? __kasan_slab_free+0x16/0x20
[ 100.283293] ? __kmem_cache_free+0x153/0x2f0
[ 100.283844] ? sched_mm_cid_remote_clear+0xb1/0x250
[ 100.284550] ? __pfx_sched_mm_cid_remote_clear+0x10/0x10
[ 100.285221] ? event_trigger_write+0xbc/0x100
[ 100.285781] ? __kasan_check_read+0x15/0x20
[ 100.286321] ? __bitmap_weight+0x66/0xa0
[ 100.286833] ? _find_next_bit+0x46/0xe0
[ 100.287334] ? task_mm_cid_work+0x37f/0x450
[ 100.287872] event_triggers_call+0x84/0x150
[ 100.288408] trace_event_buffer_commit+0x339/0x430
[ 100.289073] ? ring_buffer_event_data+0x3f/0x60
[ 100.292189] trace_event_raw_event_sys_enter+0x8b/0xe0
[ 100.295434] syscall_trace_enter.constprop.0+0x18f/0x1b0
[ 100.298653] syscall_enter_from_user_mode+0x32/0x40
[ 100.301808] do_syscall_64+0x1a/0x90
[ 100.304748] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 100.307775] RIP: 0033:0x7f686c75c1cb
[ 100.310617] Code: 73 01 c3 48 8b 0d 65 3c 10 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 21 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 3c 10 00 f7 d8 64 89 01 48
[ 100.317847] RSP: 002b:00007ffc60137a38 EFLAGS: 00000246 ORIG_RAX: 0000000000000021
[ 100.321200] RAX: ffffffffffffffda RBX: 000055f566469ea0 RCX: 00007f686c75c1cb
[ 100.324631] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000000000000000a
[ 100.328104] RBP: 00007ffc60137ac0 R08: 00007f686c818460 R09: 000000000000000a
[ 100.331509] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
[ 100.334992] R13: 0000000000000007 R14: 000000000000000a R15: 0000000000000007
[ 100.338381] </TASK>
We hit the bug because when second hist trigger has was created
has_hist_vars() returned false because hist trigger did not have
variables. As a result of that save_hist_vars() was not called to add
the trigger to trace_array->hist_vars. Later on when we attempted to
remove the first histogram find_any_var_ref() failed to detect it is
being used because it did not find the second trigger in hist_vars list.
With this change we wait until trigger actions are created so we can take
into consideration if hist trigger has variable references. Also, now we
check the return value of save_hist_vars() and fail trigger creation if
save_hist_vars() fails.
Link: https://lore.kernel.org/linux-trace-kernel/20230712223021.636335-1-mkhalfel…
Cc: stable(a)vger.kernel.org
Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers")
Signed-off-by: Mohamed Khalfella <mkhalfella(a)purestorage.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index b97d3ad832f1..c8c61381eba4 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -6663,13 +6663,15 @@ static int event_hist_trigger_parse(struct event_command *cmd_ops,
if (get_named_trigger_data(trigger_data))
goto enable;
- if (has_hist_vars(hist_data))
- save_hist_vars(hist_data);
-
ret = create_actions(hist_data);
if (ret)
goto out_unreg;
+ if (has_hist_vars(hist_data) || hist_data->n_var_refs) {
+ if (save_hist_vars(hist_data))
+ goto out_unreg;
+ }
+
ret = tracing_map_init(hist_data->map);
if (ret)
goto out_unreg;
Den 2023-07-21 kl. 19:05, skrev Greg Kroah-Hartman:
> From: Christian König <christian.koenig(a)amd.com>
>
> commit a2848d08742c8e8494675892c02c0d22acbe3cf8 upstream.
>
> There is a small window where we have already incremented the pin count
> but not yet moved the bo from the lru to the pinned list.
>
> Signed-off-by: Christian König <christian.koenig(a)amd.com>
> Reported-by: Pelloux-Prayer, Pierre-Eric <Pierre-eric.Pelloux-prayer(a)amd.com>
> Tested-by: Pelloux-Prayer, Pierre-Eric <Pierre-eric.Pelloux-prayer(a)amd.com>
> Acked-by: Alex Deucher <alexander.deucher(a)amd.com>
> Cc: stable(a)vger.kernel.org
> Link: https://patchwork.freedesktop.org/patch/msgid/20230707120826.3701-1-christi…
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> ---
> drivers/gpu/drm/ttm/ttm_bo.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -517,6 +517,12 @@ static bool ttm_bo_evict_swapout_allowab
> {
> bool ret = false;
>
> + if (bo->pin_count) {
> + *locked = false;
> + *busy = false;
> + return false;
> + }
> +
> if (bo->base.resv == ctx->resv) {
> dma_resv_assert_held(bo->base.resv);
> if (ctx->allow_res_evict)
>
This one will trigger GPF and needs a follow-up fix that is not upstream
yet:
https://patchwork.freedesktop.org/patch/547897/
as reported on LKML in thread:
[bug/bisected] commit a2848d08742c8e8494675892c02c0d22acbe3cf8 cause
general protection fault, probably for non-canonical address
0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI
--
Thomas
On hardware that supports Indirect Branch Tracking (IBT), Hyper-V VMs
with ConfigVersion 9.3 or later support IBT in the guest. However,
current versions of Hyper-V have a bug in that there's not an ENDBR64
instruction at the beginning of the hypercall page. Since hypercalls are
made with an indirect call to the hypercall page, all hypercall attempts
fail with an exception and Linux panics.
A Hyper-V fix is in progress to add ENDBR64. But guard against the Linux
panic by clearing X86_FEATURE_IBT if the hypercall page doesn't start
with ENDBR. The VM will boot and run without IBT.
If future Linux 32-bit kernels were to support IBT, additional hypercall
page hackery would be needed to make IBT work for such kernels in a
Hyper-V VM.
Cc: stable(a)vger.kernel.org
Signed-off-by: Michael Kelley <mikelley(a)microsoft.com>
---
arch/x86/hyperv/hv_init.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 6c04b52..5cbee24 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -14,6 +14,7 @@
#include <asm/apic.h>
#include <asm/desc.h>
#include <asm/sev.h>
+#include <asm/ibt.h>
#include <asm/hypervisor.h>
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>
@@ -472,6 +473,26 @@ void __init hyperv_init(void)
}
/*
+ * Some versions of Hyper-V that provide IBT in guest VMs have a bug
+ * in that there's no ENDBR64 instruction at the entry to the
+ * hypercall page. Because hypercalls are invoked via an indirect call
+ * to the hypercall page, all hypercall attempts fail when IBT is
+ * enabled, and Linux panics. For such buggy versions, disable IBT.
+ *
+ * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
+ * page, so if future Linux kernel versions enable IBT for 32-bit
+ * builds, additional hypercall page hackery will be required here
+ * to provide an ENDBR32.
+ */
+#ifdef CONFIG_X86_KERNEL_IBT
+ if (cpu_feature_enabled(X86_FEATURE_IBT) &&
+ *(u32 *)hv_hypercall_pg != gen_endbr()) {
+ setup_clear_cpu_cap(X86_FEATURE_IBT);
+ pr_info("Hyper-V: Disabling IBT because of Hyper-V bug\n");
+ }
+#endif
+
+ /*
* hyperv_init() is called before LAPIC is initialized: see
* apic_intr_mode_init() -> x86_platform.apic_post_init() and
* apic_bsp_setup() -> setup_local_APIC(). The direct-mode STIMER
--
1.8.3.1
From: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
When configuring a pin as an output pin with a value of logic 0, we
end up as having a value of logic 1 on the output pin. Setting a
logic 0 a second time (or more) after that will correctly output a
logic 0 on the output pin.
By default, all GPIO pins are configured as inputs. When we enter
sc16is7xx_gpio_direction_output() for the first time, we first set the
desired value in IOSTATE, and then we configure the pin as an output.
The datasheet states that writing to IOSTATE register will trigger a
transfer of the value to the I/O pin configured as output, so if the
pin is configured as an input, nothing will be transferred.
Therefore, set the direction first in IODIR, and then set the desired
value in IOSTATE.
This is what is done in NXP application note AN10587.
Fixes: dfeae619d781 ("serial: sc16is7xx")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
Reviewed-by: Lech Perczak <lech.perczak(a)camlingroup.com>
Tested-by: Lech Perczak <lech.perczak(a)camlingroup.com>
---
drivers/tty/serial/sc16is7xx.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index bc0a288f258d..07ae889db296 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -1342,9 +1342,18 @@ static int sc16is7xx_gpio_direction_output(struct gpio_chip *chip,
state |= BIT(offset);
else
state &= ~BIT(offset);
- sc16is7xx_port_write(port, SC16IS7XX_IOSTATE_REG, state);
+
+ /*
+ * If we write IOSTATE first, and then IODIR, the output value is not
+ * transferred to the corresponding I/O pin.
+ * The datasheet states that each register bit will be transferred to
+ * the corresponding I/O pin programmed as output when writing to
+ * IOSTATE. Therefore, configure direction first with IODIR, and then
+ * set value after with IOSTATE.
+ */
sc16is7xx_port_update(port, SC16IS7XX_IODIR_REG, BIT(offset),
BIT(offset));
+ sc16is7xx_port_write(port, SC16IS7XX_IOSTATE_REG, state);
return 0;
}
--
2.30.2
From: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
Some variants in this series of UART controllers have GPIO pins that
are shared between GPIO and modem control lines.
The pin mux mode (GPIO or modem control lines) can be set for each
ports (channels) supported by the variant.
This adds a property to the device tree to set the GPIO pin mux to
modem control lines on selected ports if needed.
Cc: <stable(a)vger.kernel.org> # 6.1.x
Signed-off-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
Acked-by: Conor Dooley <conor.dooley(a)microchip.com>
Reviewed-by: Lech Perczak <lech.perczak(a)camlingroup.com>
---
.../bindings/serial/nxp,sc16is7xx.txt | 46 +++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/Documentation/devicetree/bindings/serial/nxp,sc16is7xx.txt b/Documentation/devicetree/bindings/serial/nxp,sc16is7xx.txt
index 0fa8e3e43bf8..1a7e4bff0456 100644
--- a/Documentation/devicetree/bindings/serial/nxp,sc16is7xx.txt
+++ b/Documentation/devicetree/bindings/serial/nxp,sc16is7xx.txt
@@ -23,6 +23,9 @@ Optional properties:
1 = active low.
- irda-mode-ports: An array that lists the indices of the port that
should operate in IrDA mode.
+- nxp,modem-control-line-ports: An array that lists the indices of the port that
+ should have shared GPIO lines configured as
+ modem control lines.
Example:
sc16is750: sc16is750@51 {
@@ -35,6 +38,26 @@ Example:
#gpio-cells = <2>;
};
+ sc16is752: sc16is752@53 {
+ compatible = "nxp,sc16is752";
+ reg = <0x53>;
+ clocks = <&clk20m>;
+ interrupt-parent = <&gpio3>;
+ interrupts = <7 IRQ_TYPE_EDGE_FALLING>;
+ nxp,modem-control-line-ports = <1>; /* Port 1 as modem control lines */
+ gpio-controller; /* Port 0 as GPIOs */
+ #gpio-cells = <2>;
+ };
+
+ sc16is752: sc16is752@54 {
+ compatible = "nxp,sc16is752";
+ reg = <0x54>;
+ clocks = <&clk20m>;
+ interrupt-parent = <&gpio3>;
+ interrupts = <7 IRQ_TYPE_EDGE_FALLING>;
+ nxp,modem-control-line-ports = <0 1>; /* Ports 0 and 1 as modem control lines */
+ };
+
* spi as bus
Required properties:
@@ -59,6 +82,9 @@ Optional properties:
1 = active low.
- irda-mode-ports: An array that lists the indices of the port that
should operate in IrDA mode.
+- nxp,modem-control-line-ports: An array that lists the indices of the port that
+ should have shared GPIO lines configured as
+ modem control lines.
Example:
sc16is750: sc16is750@0 {
@@ -70,3 +96,23 @@ Example:
gpio-controller;
#gpio-cells = <2>;
};
+
+ sc16is752: sc16is752@1 {
+ compatible = "nxp,sc16is752";
+ reg = <1>;
+ clocks = <&clk20m>;
+ interrupt-parent = <&gpio3>;
+ interrupts = <7 IRQ_TYPE_EDGE_FALLING>;
+ nxp,modem-control-line-ports = <1>; /* Port 1 as modem control lines */
+ gpio-controller; /* Port 0 as GPIOs */
+ #gpio-cells = <2>;
+ };
+
+ sc16is752: sc16is752@2 {
+ compatible = "nxp,sc16is752";
+ reg = <2>;
+ clocks = <&clk20m>;
+ interrupt-parent = <&gpio3>;
+ interrupts = <7 IRQ_TYPE_EDGE_FALLING>;
+ nxp,modem-control-line-ports = <0 1>; /* Ports 0 and 1 as modem control lines */
+ };
--
2.30.2
From: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
Bit SRESET (3) is cleared when a reset operation is completed. Having
the IOCONTROL register as non-volatile will always read SRESET as 1,
which is incorrect.
Also, if IOCONTROL register is not a volatile register, the upcoming
patch "serial: sc16is7xx: fix regression with GPIO configuration"
doesn't work when setting some shared GPIO lines as modem control
lines.
Therefore mark IOCONTROL register as a volatile register.
Cc: <stable(a)vger.kernel.org> # 6.1.x
Signed-off-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
Reviewed-by: Lech Perczak <lech.perczak(a)camlingroup.com>
Tested-by: Lech Perczak <lech.perczak(a)camlingroup.com>
---
drivers/tty/serial/sc16is7xx.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index 8ae2afc76a9b..306ae512b38a 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -488,6 +488,7 @@ static bool sc16is7xx_regmap_volatile(struct device *dev, unsigned int reg)
case SC16IS7XX_TXLVL_REG:
case SC16IS7XX_RXLVL_REG:
case SC16IS7XX_IOSTATE_REG:
+ case SC16IS7XX_IOCONTROL_REG:
return true;
default:
break;
--
2.30.2
From: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
The sc16is7xx_config_rs485() function is called only for the second
port (index 1, channel B), causing initialization problems for the
first port.
For the sc16is7xx driver, port->membase and port->mapbase are not set,
and their default values are 0. And we set port->iobase to the device
index. This means that when the first device is registered using the
uart_add_one_port() function, the following values will be in the port
structure:
port->membase = 0
port->mapbase = 0
port->iobase = 0
Therefore, the function uart_configure_port() in serial_core.c will
exit early because of the following check:
/*
* If there isn't a port here, don't do anything further.
*/
if (!port->iobase && !port->mapbase && !port->membase)
return;
Typically, I2C and SPI drivers do not set port->membase and
port->mapbase.
The max310x driver sets port->membase to ~0 (all ones). By
implementing the same change in this driver, uart_configure_port() is
now correctly executed for all ports.
Fixes: dfeae619d781 ("serial: sc16is7xx")
Cc: <stable(a)vger.kernel.org> # 6.1.x
Signed-off-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Lech Perczak <lech.perczak(a)camlingroup.com>
Tested-by: Lech Perczak <lech.perczak(a)camlingroup.com>
---
drivers/tty/serial/sc16is7xx.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index 2e7e7c409cf2..8ae2afc76a9b 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -1436,6 +1436,7 @@ static int sc16is7xx_probe(struct device *dev,
s->p[i].port.fifosize = SC16IS7XX_FIFO_SIZE;
s->p[i].port.flags = UPF_FIXED_TYPE | UPF_LOW_LATENCY;
s->p[i].port.iobase = i;
+ s->p[i].port.membase = (void __iomem *)~0;
s->p[i].port.iotype = UPIO_PORT;
s->p[i].port.uartclk = freq;
s->p[i].port.rs485_config = sc16is7xx_config_rs485;
--
2.30.2
Syzkaller reports use-after-free at addr_handler in 5.10 stable
releases. The problem was fixed in upstream and backported into
5.14, but wasn't applied to 5.10 and lower versions due to a small
merge conflict.
This patch is a modified version that can be cleanly applied to 5.10 and
5.4 stable branches.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x b1b9d3825df4c757d653d0b1df66f084835db9c3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072113-unveiling-lizard-c937@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
b1b9d3825df4 ("scsi: qla2xxx: Correct the index of array")
27258a577144 ("scsi: qla2xxx: Add a shadow variable to hold disc_state history of fcport")
58e39a2ce4be ("scsi: qla2xxx: Change discovery state before PLOGI")
983f127603fa ("scsi: qla2xxx: Retry PLOGI on FC-NVMe PRLI failure")
c76ae845ea83 ("scsi: qla2xxx: Add error handling for PLOGI ELS passthrough")
84ed362ac40c ("scsi: qla2xxx: Dual FCP-NVMe target port support")
f3f1938bb673 ("scsi: qla2xxx: Fix N2N link up fail")
7f2a398d59d6 ("scsi: qla2xxx: Fix N2N link reset")
ce0ba496dccf ("scsi: qla2xxx: Fix stuck login session")
897def200421 ("scsi: qla2xxx: Inline the qla2x00_fcport_event_handler() function")
0184793df2e8 ("scsi: qla2xxx: Use tabs instead of spaces for indentation")
a630bdc54f6d ("scsi: qla2xxx: Move qla2x00_set_fcport_state() from a .h into a .c file")
bd432bb53cff ("scsi: qla2xxx: Leave a blank line after declarations")
2703eaaf4eae ("scsi: qla2xxx: Use tabs to indent code")
a6a6d0589ac4 ("scsi: scsi_transport_fc: nvme: display FC-NVMe port roles")
f8f97b0c5b7f ("scsi: qla2xxx: Cleanups for NVRAM/Flash read/write path")
ecc89f25e225 ("scsi: qla2xxx: Add Device ID for ISP28XX")
24ef8f7eb5d0 ("scsi: qla2xxx: Fix routine qla27xx_dump_{mpi|ram}()")
df617ffbbc5e ("scsi: qla2xxx: Add fw_attr and port_no SysFS node")
64f61d994483 ("scsi: qla2xxx: Add new FW dump template entry types")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b1b9d3825df4c757d653d0b1df66f084835db9c3 Mon Sep 17 00:00:00 2001
From: Bikash Hazarika <bhazarika(a)marvell.com>
Date: Wed, 7 Jun 2023 17:08:42 +0530
Subject: [PATCH] scsi: qla2xxx: Correct the index of array
Klocwork reported array 'port_dstate_str' of size 10 may use index value(s)
10..15.
Add a fix to correct the index of array.
Cc: stable(a)vger.kernel.org
Signed-off-by: Bikash Hazarika <bhazarika(a)marvell.com>
Signed-off-by: Nilesh Javali <njavali(a)marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-8-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/qla2xxx/qla_inline.h b/drivers/scsi/qla2xxx/qla_inline.h
index cce6e425c121..946a39504a35 100644
--- a/drivers/scsi/qla2xxx/qla_inline.h
+++ b/drivers/scsi/qla2xxx/qla_inline.h
@@ -109,11 +109,13 @@ qla2x00_set_fcport_disc_state(fc_port_t *fcport, int state)
{
int old_val;
uint8_t shiftbits, mask;
+ uint8_t port_dstate_str_sz;
/* This will have to change when the max no. of states > 16 */
shiftbits = 4;
mask = (1 << shiftbits) - 1;
+ port_dstate_str_sz = sizeof(port_dstate_str) / sizeof(char *);
fcport->disc_state = state;
while (1) {
old_val = atomic_read(&fcport->shadow_disc_state);
@@ -121,7 +123,8 @@ qla2x00_set_fcport_disc_state(fc_port_t *fcport, int state)
old_val, (old_val << shiftbits) | state)) {
ql_dbg(ql_dbg_disc, fcport->vha, 0x2134,
"FCPort %8phC disc_state transition: %s to %s - portid=%06x.\n",
- fcport->port_name, port_dstate_str[old_val & mask],
+ fcport->port_name, (old_val & mask) < port_dstate_str_sz ?
+ port_dstate_str[old_val & mask] : "Unknown",
port_dstate_str[state], fcport->d_id.b24);
return;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x b1b9d3825df4c757d653d0b1df66f084835db9c3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072112-plywood-delusion-dd8d@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
b1b9d3825df4 ("scsi: qla2xxx: Correct the index of array")
27258a577144 ("scsi: qla2xxx: Add a shadow variable to hold disc_state history of fcport")
58e39a2ce4be ("scsi: qla2xxx: Change discovery state before PLOGI")
983f127603fa ("scsi: qla2xxx: Retry PLOGI on FC-NVMe PRLI failure")
c76ae845ea83 ("scsi: qla2xxx: Add error handling for PLOGI ELS passthrough")
84ed362ac40c ("scsi: qla2xxx: Dual FCP-NVMe target port support")
f3f1938bb673 ("scsi: qla2xxx: Fix N2N link up fail")
7f2a398d59d6 ("scsi: qla2xxx: Fix N2N link reset")
ce0ba496dccf ("scsi: qla2xxx: Fix stuck login session")
897def200421 ("scsi: qla2xxx: Inline the qla2x00_fcport_event_handler() function")
0184793df2e8 ("scsi: qla2xxx: Use tabs instead of spaces for indentation")
a630bdc54f6d ("scsi: qla2xxx: Move qla2x00_set_fcport_state() from a .h into a .c file")
bd432bb53cff ("scsi: qla2xxx: Leave a blank line after declarations")
2703eaaf4eae ("scsi: qla2xxx: Use tabs to indent code")
a6a6d0589ac4 ("scsi: scsi_transport_fc: nvme: display FC-NVMe port roles")
f8f97b0c5b7f ("scsi: qla2xxx: Cleanups for NVRAM/Flash read/write path")
ecc89f25e225 ("scsi: qla2xxx: Add Device ID for ISP28XX")
24ef8f7eb5d0 ("scsi: qla2xxx: Fix routine qla27xx_dump_{mpi|ram}()")
df617ffbbc5e ("scsi: qla2xxx: Add fw_attr and port_no SysFS node")
64f61d994483 ("scsi: qla2xxx: Add new FW dump template entry types")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b1b9d3825df4c757d653d0b1df66f084835db9c3 Mon Sep 17 00:00:00 2001
From: Bikash Hazarika <bhazarika(a)marvell.com>
Date: Wed, 7 Jun 2023 17:08:42 +0530
Subject: [PATCH] scsi: qla2xxx: Correct the index of array
Klocwork reported array 'port_dstate_str' of size 10 may use index value(s)
10..15.
Add a fix to correct the index of array.
Cc: stable(a)vger.kernel.org
Signed-off-by: Bikash Hazarika <bhazarika(a)marvell.com>
Signed-off-by: Nilesh Javali <njavali(a)marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-8-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/qla2xxx/qla_inline.h b/drivers/scsi/qla2xxx/qla_inline.h
index cce6e425c121..946a39504a35 100644
--- a/drivers/scsi/qla2xxx/qla_inline.h
+++ b/drivers/scsi/qla2xxx/qla_inline.h
@@ -109,11 +109,13 @@ qla2x00_set_fcport_disc_state(fc_port_t *fcport, int state)
{
int old_val;
uint8_t shiftbits, mask;
+ uint8_t port_dstate_str_sz;
/* This will have to change when the max no. of states > 16 */
shiftbits = 4;
mask = (1 << shiftbits) - 1;
+ port_dstate_str_sz = sizeof(port_dstate_str) / sizeof(char *);
fcport->disc_state = state;
while (1) {
old_val = atomic_read(&fcport->shadow_disc_state);
@@ -121,7 +123,8 @@ qla2x00_set_fcport_disc_state(fc_port_t *fcport, int state)
old_val, (old_val << shiftbits) | state)) {
ql_dbg(ql_dbg_disc, fcport->vha, 0x2134,
"FCPort %8phC disc_state transition: %s to %s - portid=%06x.\n",
- fcport->port_name, port_dstate_str[old_val & mask],
+ fcport->port_name, (old_val & mask) < port_dstate_str_sz ?
+ port_dstate_str[old_val & mask] : "Unknown",
port_dstate_str[state], fcport->d_id.b24);
return;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x b68710a8094fdffe8dd4f7a82c82649f479bb453
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072125-washbowl-subtitle-bab2@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
b68710a8094f ("scsi: qla2xxx: Fix buffer overrun")
44f5a37d1e3e ("scsi: qla2xxx: Fix buffer-buffer credit extraction error")
897d68eb816b ("scsi: qla2xxx: Fix WARN_ON in qla_nvme_register_hba")
9f2475fe7406 ("scsi: qla2xxx: SAN congestion management implementation")
62e9dd177732 ("scsi: qla2xxx: Change in PUREX to handle FPIN ELS requests")
818dbde78e0f ("Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b68710a8094fdffe8dd4f7a82c82649f479bb453 Mon Sep 17 00:00:00 2001
From: Quinn Tran <qutran(a)marvell.com>
Date: Wed, 7 Jun 2023 17:08:40 +0530
Subject: [PATCH] scsi: qla2xxx: Fix buffer overrun
Klocwork warning: Buffer Overflow - Array Index Out of Bounds
Driver uses fc_els_flogi to calculate size of buffer. The actual buffer is
nested inside of fc_els_flogi which is smaller.
Replace structure name to allow proper size calculation.
Cc: stable(a)vger.kernel.org
Signed-off-by: Quinn Tran <qutran(a)marvell.com>
Signed-off-by: Nilesh Javali <njavali(a)marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-6-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index 0df6eae7324e..b0225f6f3221 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -5549,7 +5549,7 @@ static void qla_get_login_template(scsi_qla_host_t *vha)
__be32 *q;
memset(ha->init_cb, 0, ha->init_cb_size);
- sz = min_t(int, sizeof(struct fc_els_flogi), ha->init_cb_size);
+ sz = min_t(int, sizeof(struct fc_els_csp), ha->init_cb_size);
rval = qla24xx_get_port_login_templ(vha, ha->init_cb_dma,
ha->init_cb, sz);
if (rval != QLA_SUCCESS) {
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x b68710a8094fdffe8dd4f7a82c82649f479bb453
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072124-cruelty-cruncher-7f60@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
b68710a8094f ("scsi: qla2xxx: Fix buffer overrun")
44f5a37d1e3e ("scsi: qla2xxx: Fix buffer-buffer credit extraction error")
897d68eb816b ("scsi: qla2xxx: Fix WARN_ON in qla_nvme_register_hba")
9f2475fe7406 ("scsi: qla2xxx: SAN congestion management implementation")
62e9dd177732 ("scsi: qla2xxx: Change in PUREX to handle FPIN ELS requests")
818dbde78e0f ("Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b68710a8094fdffe8dd4f7a82c82649f479bb453 Mon Sep 17 00:00:00 2001
From: Quinn Tran <qutran(a)marvell.com>
Date: Wed, 7 Jun 2023 17:08:40 +0530
Subject: [PATCH] scsi: qla2xxx: Fix buffer overrun
Klocwork warning: Buffer Overflow - Array Index Out of Bounds
Driver uses fc_els_flogi to calculate size of buffer. The actual buffer is
nested inside of fc_els_flogi which is smaller.
Replace structure name to allow proper size calculation.
Cc: stable(a)vger.kernel.org
Signed-off-by: Quinn Tran <qutran(a)marvell.com>
Signed-off-by: Nilesh Javali <njavali(a)marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-6-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index 0df6eae7324e..b0225f6f3221 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -5549,7 +5549,7 @@ static void qla_get_login_template(scsi_qla_host_t *vha)
__be32 *q;
memset(ha->init_cb, 0, ha->init_cb_size);
- sz = min_t(int, sizeof(struct fc_els_flogi), ha->init_cb_size);
+ sz = min_t(int, sizeof(struct fc_els_csp), ha->init_cb_size);
rval = qla24xx_get_port_login_templ(vha, ha->init_cb_dma,
ha->init_cb, sz);
if (rval != QLA_SUCCESS) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x b68710a8094fdffe8dd4f7a82c82649f479bb453
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072123-oink-gains-2382@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
b68710a8094f ("scsi: qla2xxx: Fix buffer overrun")
44f5a37d1e3e ("scsi: qla2xxx: Fix buffer-buffer credit extraction error")
897d68eb816b ("scsi: qla2xxx: Fix WARN_ON in qla_nvme_register_hba")
9f2475fe7406 ("scsi: qla2xxx: SAN congestion management implementation")
62e9dd177732 ("scsi: qla2xxx: Change in PUREX to handle FPIN ELS requests")
818dbde78e0f ("Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b68710a8094fdffe8dd4f7a82c82649f479bb453 Mon Sep 17 00:00:00 2001
From: Quinn Tran <qutran(a)marvell.com>
Date: Wed, 7 Jun 2023 17:08:40 +0530
Subject: [PATCH] scsi: qla2xxx: Fix buffer overrun
Klocwork warning: Buffer Overflow - Array Index Out of Bounds
Driver uses fc_els_flogi to calculate size of buffer. The actual buffer is
nested inside of fc_els_flogi which is smaller.
Replace structure name to allow proper size calculation.
Cc: stable(a)vger.kernel.org
Signed-off-by: Quinn Tran <qutran(a)marvell.com>
Signed-off-by: Nilesh Javali <njavali(a)marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-6-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index 0df6eae7324e..b0225f6f3221 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -5549,7 +5549,7 @@ static void qla_get_login_template(scsi_qla_host_t *vha)
__be32 *q;
memset(ha->init_cb, 0, ha->init_cb_size);
- sz = min_t(int, sizeof(struct fc_els_flogi), ha->init_cb_size);
+ sz = min_t(int, sizeof(struct fc_els_csp), ha->init_cb_size);
rval = qla24xx_get_port_login_templ(vha, ha->init_cb_dma,
ha->init_cb, sz);
if (rval != QLA_SUCCESS) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 6b504d06976fe4a61cc05dedc68b84fadb397f77
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072105-poking-wasp-4610@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
6b504d06976f ("scsi: qla2xxx: Avoid fcport pointer dereference")
e0fb8ce2bb9e ("scsi: qla2xxx: edif: Fix potential stuck session in sa update")
31e6cdbe0eae ("scsi: qla2xxx: Implement ref count for SRB")
d4523bd6fd5d ("scsi: qla2xxx: Refactor asynchronous command initialization")
2cabf10dbbe3 ("scsi: qla2xxx: Fix hang on NVMe command timeouts")
e3d2612f583b ("scsi: qla2xxx: Fix use after free in debug code")
9efea843a906 ("scsi: qla2xxx: edif: Add detection of secure device")
dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
fac2807946c1 ("scsi: qla2xxx: edif: Add extraction of auth_els from the wire")
84318a9f01ce ("scsi: qla2xxx: edif: Add send, receive, and accept for auth_els")
7878f22a2e03 ("scsi: qla2xxx: edif: Add getfcinfo and statistic bsgs")
7ebb336e45ef ("scsi: qla2xxx: edif: Add start + stop bsgs")
d94d8158e184 ("scsi: qla2xxx: Add heartbeat check")
f7a0ed479e66 ("scsi: qla2xxx: Fix crash in PCIe error handling")
2ce35c0821af ("scsi: qla2xxx: Fix use after free in bsg")
5777fef788a5 ("scsi: qla2xxx: Consolidate zio threshold setting for both FCP & NVMe")
960204ecca5e ("scsi: qla2xxx: Simplify if statement")
a04658594399 ("scsi: qla2xxx: Wait for ABTS response on I/O timeouts for NVMe")
dbf1f53cfd23 ("scsi: qla2xxx: Implementation to get and manage host, target stats and initiator port")
707531bc2626 ("scsi: qla2xxx: If fcport is undergoing deletion complete I/O with retry")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b504d06976fe4a61cc05dedc68b84fadb397f77 Mon Sep 17 00:00:00 2001
From: Nilesh Javali <njavali(a)marvell.com>
Date: Wed, 7 Jun 2023 17:08:38 +0530
Subject: [PATCH] scsi: qla2xxx: Avoid fcport pointer dereference
Klocwork reported warning of NULL pointer may be dereferenced. The routine
exits when sa_ctl is NULL and fcport is allocated after the exit call thus
causing NULL fcport pointer to dereference at the time of exit.
To avoid fcport pointer dereference, exit the routine when sa_ctl is NULL.
Cc: stable(a)vger.kernel.org
Signed-off-by: Nilesh Javali <njavali(a)marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-4-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/qla2xxx/qla_edif.c b/drivers/scsi/qla2xxx/qla_edif.c
index ec0e20255bd3..26e6b3e3af43 100644
--- a/drivers/scsi/qla2xxx/qla_edif.c
+++ b/drivers/scsi/qla2xxx/qla_edif.c
@@ -2361,8 +2361,8 @@ qla24xx_issue_sa_replace_iocb(scsi_qla_host_t *vha, struct qla_work_evt *e)
if (!sa_ctl) {
ql_dbg(ql_dbg_edif, vha, 0x70e6,
"sa_ctl allocation failed\n");
- rval = -ENOMEM;
- goto done;
+ rval = -ENOMEM;
+ return rval;
}
fcport = sa_ctl->fcport;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 6b504d06976fe4a61cc05dedc68b84fadb397f77
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072104-brilliant-museum-198a@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
6b504d06976f ("scsi: qla2xxx: Avoid fcport pointer dereference")
e0fb8ce2bb9e ("scsi: qla2xxx: edif: Fix potential stuck session in sa update")
31e6cdbe0eae ("scsi: qla2xxx: Implement ref count for SRB")
d4523bd6fd5d ("scsi: qla2xxx: Refactor asynchronous command initialization")
2cabf10dbbe3 ("scsi: qla2xxx: Fix hang on NVMe command timeouts")
e3d2612f583b ("scsi: qla2xxx: Fix use after free in debug code")
9efea843a906 ("scsi: qla2xxx: edif: Add detection of secure device")
dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
fac2807946c1 ("scsi: qla2xxx: edif: Add extraction of auth_els from the wire")
84318a9f01ce ("scsi: qla2xxx: edif: Add send, receive, and accept for auth_els")
7878f22a2e03 ("scsi: qla2xxx: edif: Add getfcinfo and statistic bsgs")
7ebb336e45ef ("scsi: qla2xxx: edif: Add start + stop bsgs")
d94d8158e184 ("scsi: qla2xxx: Add heartbeat check")
f7a0ed479e66 ("scsi: qla2xxx: Fix crash in PCIe error handling")
2ce35c0821af ("scsi: qla2xxx: Fix use after free in bsg")
5777fef788a5 ("scsi: qla2xxx: Consolidate zio threshold setting for both FCP & NVMe")
960204ecca5e ("scsi: qla2xxx: Simplify if statement")
a04658594399 ("scsi: qla2xxx: Wait for ABTS response on I/O timeouts for NVMe")
dbf1f53cfd23 ("scsi: qla2xxx: Implementation to get and manage host, target stats and initiator port")
707531bc2626 ("scsi: qla2xxx: If fcport is undergoing deletion complete I/O with retry")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b504d06976fe4a61cc05dedc68b84fadb397f77 Mon Sep 17 00:00:00 2001
From: Nilesh Javali <njavali(a)marvell.com>
Date: Wed, 7 Jun 2023 17:08:38 +0530
Subject: [PATCH] scsi: qla2xxx: Avoid fcport pointer dereference
Klocwork reported warning of NULL pointer may be dereferenced. The routine
exits when sa_ctl is NULL and fcport is allocated after the exit call thus
causing NULL fcport pointer to dereference at the time of exit.
To avoid fcport pointer dereference, exit the routine when sa_ctl is NULL.
Cc: stable(a)vger.kernel.org
Signed-off-by: Nilesh Javali <njavali(a)marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-4-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/qla2xxx/qla_edif.c b/drivers/scsi/qla2xxx/qla_edif.c
index ec0e20255bd3..26e6b3e3af43 100644
--- a/drivers/scsi/qla2xxx/qla_edif.c
+++ b/drivers/scsi/qla2xxx/qla_edif.c
@@ -2361,8 +2361,8 @@ qla24xx_issue_sa_replace_iocb(scsi_qla_host_t *vha, struct qla_work_evt *e)
if (!sa_ctl) {
ql_dbg(ql_dbg_edif, vha, 0x70e6,
"sa_ctl allocation failed\n");
- rval = -ENOMEM;
- goto done;
+ rval = -ENOMEM;
+ return rval;
}
fcport = sa_ctl->fcport;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 6b504d06976fe4a61cc05dedc68b84fadb397f77
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072104-tidiness-facing-d23a@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
6b504d06976f ("scsi: qla2xxx: Avoid fcport pointer dereference")
e0fb8ce2bb9e ("scsi: qla2xxx: edif: Fix potential stuck session in sa update")
31e6cdbe0eae ("scsi: qla2xxx: Implement ref count for SRB")
d4523bd6fd5d ("scsi: qla2xxx: Refactor asynchronous command initialization")
2cabf10dbbe3 ("scsi: qla2xxx: Fix hang on NVMe command timeouts")
e3d2612f583b ("scsi: qla2xxx: Fix use after free in debug code")
9efea843a906 ("scsi: qla2xxx: edif: Add detection of secure device")
dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
fac2807946c1 ("scsi: qla2xxx: edif: Add extraction of auth_els from the wire")
84318a9f01ce ("scsi: qla2xxx: edif: Add send, receive, and accept for auth_els")
7878f22a2e03 ("scsi: qla2xxx: edif: Add getfcinfo and statistic bsgs")
7ebb336e45ef ("scsi: qla2xxx: edif: Add start + stop bsgs")
d94d8158e184 ("scsi: qla2xxx: Add heartbeat check")
f7a0ed479e66 ("scsi: qla2xxx: Fix crash in PCIe error handling")
2ce35c0821af ("scsi: qla2xxx: Fix use after free in bsg")
5777fef788a5 ("scsi: qla2xxx: Consolidate zio threshold setting for both FCP & NVMe")
960204ecca5e ("scsi: qla2xxx: Simplify if statement")
a04658594399 ("scsi: qla2xxx: Wait for ABTS response on I/O timeouts for NVMe")
dbf1f53cfd23 ("scsi: qla2xxx: Implementation to get and manage host, target stats and initiator port")
707531bc2626 ("scsi: qla2xxx: If fcport is undergoing deletion complete I/O with retry")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b504d06976fe4a61cc05dedc68b84fadb397f77 Mon Sep 17 00:00:00 2001
From: Nilesh Javali <njavali(a)marvell.com>
Date: Wed, 7 Jun 2023 17:08:38 +0530
Subject: [PATCH] scsi: qla2xxx: Avoid fcport pointer dereference
Klocwork reported warning of NULL pointer may be dereferenced. The routine
exits when sa_ctl is NULL and fcport is allocated after the exit call thus
causing NULL fcport pointer to dereference at the time of exit.
To avoid fcport pointer dereference, exit the routine when sa_ctl is NULL.
Cc: stable(a)vger.kernel.org
Signed-off-by: Nilesh Javali <njavali(a)marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-4-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/qla2xxx/qla_edif.c b/drivers/scsi/qla2xxx/qla_edif.c
index ec0e20255bd3..26e6b3e3af43 100644
--- a/drivers/scsi/qla2xxx/qla_edif.c
+++ b/drivers/scsi/qla2xxx/qla_edif.c
@@ -2361,8 +2361,8 @@ qla24xx_issue_sa_replace_iocb(scsi_qla_host_t *vha, struct qla_work_evt *e)
if (!sa_ctl) {
ql_dbg(ql_dbg_edif, vha, 0x70e6,
"sa_ctl allocation failed\n");
- rval = -ENOMEM;
- goto done;
+ rval = -ENOMEM;
+ return rval;
}
fcport = sa_ctl->fcport;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 6b504d06976fe4a61cc05dedc68b84fadb397f77
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072103-perceive-corrosive-fe47@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
6b504d06976f ("scsi: qla2xxx: Avoid fcport pointer dereference")
e0fb8ce2bb9e ("scsi: qla2xxx: edif: Fix potential stuck session in sa update")
31e6cdbe0eae ("scsi: qla2xxx: Implement ref count for SRB")
d4523bd6fd5d ("scsi: qla2xxx: Refactor asynchronous command initialization")
2cabf10dbbe3 ("scsi: qla2xxx: Fix hang on NVMe command timeouts")
e3d2612f583b ("scsi: qla2xxx: Fix use after free in debug code")
9efea843a906 ("scsi: qla2xxx: edif: Add detection of secure device")
dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
fac2807946c1 ("scsi: qla2xxx: edif: Add extraction of auth_els from the wire")
84318a9f01ce ("scsi: qla2xxx: edif: Add send, receive, and accept for auth_els")
7878f22a2e03 ("scsi: qla2xxx: edif: Add getfcinfo and statistic bsgs")
7ebb336e45ef ("scsi: qla2xxx: edif: Add start + stop bsgs")
d94d8158e184 ("scsi: qla2xxx: Add heartbeat check")
f7a0ed479e66 ("scsi: qla2xxx: Fix crash in PCIe error handling")
2ce35c0821af ("scsi: qla2xxx: Fix use after free in bsg")
5777fef788a5 ("scsi: qla2xxx: Consolidate zio threshold setting for both FCP & NVMe")
960204ecca5e ("scsi: qla2xxx: Simplify if statement")
a04658594399 ("scsi: qla2xxx: Wait for ABTS response on I/O timeouts for NVMe")
dbf1f53cfd23 ("scsi: qla2xxx: Implementation to get and manage host, target stats and initiator port")
707531bc2626 ("scsi: qla2xxx: If fcport is undergoing deletion complete I/O with retry")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b504d06976fe4a61cc05dedc68b84fadb397f77 Mon Sep 17 00:00:00 2001
From: Nilesh Javali <njavali(a)marvell.com>
Date: Wed, 7 Jun 2023 17:08:38 +0530
Subject: [PATCH] scsi: qla2xxx: Avoid fcport pointer dereference
Klocwork reported warning of NULL pointer may be dereferenced. The routine
exits when sa_ctl is NULL and fcport is allocated after the exit call thus
causing NULL fcport pointer to dereference at the time of exit.
To avoid fcport pointer dereference, exit the routine when sa_ctl is NULL.
Cc: stable(a)vger.kernel.org
Signed-off-by: Nilesh Javali <njavali(a)marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-4-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/qla2xxx/qla_edif.c b/drivers/scsi/qla2xxx/qla_edif.c
index ec0e20255bd3..26e6b3e3af43 100644
--- a/drivers/scsi/qla2xxx/qla_edif.c
+++ b/drivers/scsi/qla2xxx/qla_edif.c
@@ -2361,8 +2361,8 @@ qla24xx_issue_sa_replace_iocb(scsi_qla_host_t *vha, struct qla_work_evt *e)
if (!sa_ctl) {
ql_dbg(ql_dbg_edif, vha, 0x70e6,
"sa_ctl allocation failed\n");
- rval = -ENOMEM;
- goto done;
+ rval = -ENOMEM;
+ return rval;
}
fcport = sa_ctl->fcport;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x d721b591b95cf3f290f8a7cbe90aa2ee0368388d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072141-freezable-tactical-2c4a@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
d721b591b95c ("scsi: qla2xxx: Array index may go out of bound")
250bd00923c7 ("scsi: qla2xxx: Fix inconsistent format argument type in qla_os.c")
a4239945b8ad ("scsi: qla2xxx: Add switch command to simplify fabric discovery")
9cd883f07a54 ("scsi: qla2xxx: Fix session cleanup for N2N")
82abdcaf3ede ("scsi: qla2xxx: Allow target mode to accept PRLI in dual mode")
11aea16ab3f5 ("scsi: qla2xxx: Add ability to send PRLO")
9b3e0f4d4147 ("scsi: qla2xxx: Move work element processing out of DPC thread")
f13515acdcb5 ("scsi: qla2xxx: Replace GPDB with async ADISC command")
2853192e154b ("scsi: qla2xxx: Use IOCB path to submit Control VP MBX command")
4005a995668b ("scsi: qla2xxx: Fix Relogin being triggered too fast")
5ef696aa9f3c ("scsi: qla2xxx: Relogin to target port on a cable swap")
414d9ff3f803 ("scsi: qla2xxx: Fix login state machine stuck at GPDB")
2d73ac6102d9 ("scsi: qla2xxx: Serialize GPNID for multiple RSCN")
25ad76b703d9 ("scsi: qla2xxx: Retry switch command on time out")
a084fd68e1d2 ("scsi: qla2xxx: Fix re-login for Nport Handle in use")
a01c77d2cbc4 ("scsi: qla2xxx: Move session delete to driver work queue")
2d57b5efda51 ("scsi: qla2xxx: Query FC4 type during RSCN processing")
edd05de19759 ("scsi: qla2xxx: Changes to support N2N logins")
c0c462c8a061 ("scsi: qla2xxx: Allow MBC_GET_PORT_DATABASE to query and save the port states")
08eb7f45de61 ("scsi: qla2xxx: Cocci spatch "pool_zalloc-simple"")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d721b591b95cf3f290f8a7cbe90aa2ee0368388d Mon Sep 17 00:00:00 2001
From: Nilesh Javali <njavali(a)marvell.com>
Date: Wed, 7 Jun 2023 17:08:36 +0530
Subject: [PATCH] scsi: qla2xxx: Array index may go out of bound
Klocwork reports array 'vha->host_str' of size 16 may use index value(s)
16..19. Use snprintf() instead of sprintf().
Cc: stable(a)vger.kernel.org
Co-developed-by: Bikash Hazarika <bhazarika(a)marvell.com>
Signed-off-by: Bikash Hazarika <bhazarika(a)marvell.com>
Signed-off-by: Nilesh Javali <njavali(a)marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-2-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index bc89d3da8fd0..3bace9ea6288 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -5088,7 +5088,8 @@ struct scsi_qla_host *qla2x00_create_host(const struct scsi_host_template *sht,
}
INIT_DELAYED_WORK(&vha->scan.scan_work, qla_scan_work_fn);
- sprintf(vha->host_str, "%s_%lu", QLA2XXX_DRIVER_NAME, vha->host_no);
+ snprintf(vha->host_str, sizeof(vha->host_str), "%s_%lu",
+ QLA2XXX_DRIVER_NAME, vha->host_no);
ql_dbg(ql_dbg_init, vha, 0x0041,
"Allocated the host=%p hw=%p vha=%p dev_name=%s",
vha->host, vha->hw, vha,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x d721b591b95cf3f290f8a7cbe90aa2ee0368388d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072140-dilute-stood-1935@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
d721b591b95c ("scsi: qla2xxx: Array index may go out of bound")
250bd00923c7 ("scsi: qla2xxx: Fix inconsistent format argument type in qla_os.c")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d721b591b95cf3f290f8a7cbe90aa2ee0368388d Mon Sep 17 00:00:00 2001
From: Nilesh Javali <njavali(a)marvell.com>
Date: Wed, 7 Jun 2023 17:08:36 +0530
Subject: [PATCH] scsi: qla2xxx: Array index may go out of bound
Klocwork reports array 'vha->host_str' of size 16 may use index value(s)
16..19. Use snprintf() instead of sprintf().
Cc: stable(a)vger.kernel.org
Co-developed-by: Bikash Hazarika <bhazarika(a)marvell.com>
Signed-off-by: Bikash Hazarika <bhazarika(a)marvell.com>
Signed-off-by: Nilesh Javali <njavali(a)marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-2-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index bc89d3da8fd0..3bace9ea6288 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -5088,7 +5088,8 @@ struct scsi_qla_host *qla2x00_create_host(const struct scsi_host_template *sht,
}
INIT_DELAYED_WORK(&vha->scan.scan_work, qla_scan_work_fn);
- sprintf(vha->host_str, "%s_%lu", QLA2XXX_DRIVER_NAME, vha->host_no);
+ snprintf(vha->host_str, sizeof(vha->host_str), "%s_%lu",
+ QLA2XXX_DRIVER_NAME, vha->host_no);
ql_dbg(ql_dbg_init, vha, 0x0041,
"Allocated the host=%p hw=%p vha=%p dev_name=%s",
vha->host, vha->hw, vha,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x d721b591b95cf3f290f8a7cbe90aa2ee0368388d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072139-seismic-unreached-ff9a@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
d721b591b95c ("scsi: qla2xxx: Array index may go out of bound")
250bd00923c7 ("scsi: qla2xxx: Fix inconsistent format argument type in qla_os.c")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d721b591b95cf3f290f8a7cbe90aa2ee0368388d Mon Sep 17 00:00:00 2001
From: Nilesh Javali <njavali(a)marvell.com>
Date: Wed, 7 Jun 2023 17:08:36 +0530
Subject: [PATCH] scsi: qla2xxx: Array index may go out of bound
Klocwork reports array 'vha->host_str' of size 16 may use index value(s)
16..19. Use snprintf() instead of sprintf().
Cc: stable(a)vger.kernel.org
Co-developed-by: Bikash Hazarika <bhazarika(a)marvell.com>
Signed-off-by: Bikash Hazarika <bhazarika(a)marvell.com>
Signed-off-by: Nilesh Javali <njavali(a)marvell.com>
Link: https://lore.kernel.org/r/20230607113843.37185-2-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index bc89d3da8fd0..3bace9ea6288 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -5088,7 +5088,8 @@ struct scsi_qla_host *qla2x00_create_host(const struct scsi_host_template *sht,
}
INIT_DELAYED_WORK(&vha->scan.scan_work, qla_scan_work_fn);
- sprintf(vha->host_str, "%s_%lu", QLA2XXX_DRIVER_NAME, vha->host_no);
+ snprintf(vha->host_str, sizeof(vha->host_str), "%s_%lu",
+ QLA2XXX_DRIVER_NAME, vha->host_no);
ql_dbg(ql_dbg_init, vha, 0x0041,
"Allocated the host=%p hw=%p vha=%p dev_name=%s",
vha->host, vha->hw, vha,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 797311bce5c2ac90b8d65e357603cfd410d36ebb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072106-theme-headache-d3ba@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
797311bce5c2 ("tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails")
4ed8f337dee3 ("Revert "tracing: Add "(fault)" name injection to kernel probes"")
e38e2c6a9efc ("tracing/probes: Fix to update dynamic data counter if fetcharg uses it")
00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces")
f1d3cbfaafc1 ("tracing: Move duplicate code of trace_kprobe/eprobe.c into header")
7491e2c44278 ("tracing: Add a probe that attaches to trace events")
8565a45d0858 ("tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs")
007517a01995 ("tracing/probe: Change traceprobe_set_print_fmt() to take a type")
3b13911a2fd0 ("tracing: Synthetic event field_pos is an index not a boolean")
bc87cf0a08d4 ("trace: Add a generic function to read/write u64 values from tracefs")
d262271d0483 ("tracing/dynevent: Delegate parsing to create function")
d4d704637d93 ("tracing: Add synthetic event error logging")
9bbb33291f8e ("tracing: Check that the synthetic event and field names are legal")
42d120e2dda5 ("tracing: Move is_good_name() from trace_probe.h to trace.h")
8db4d6bfbbf9 ("tracing: Change synthetic event string format to limit printed length")
bd82631d7ccd ("tracing: Add support for dynamic strings to synthetic events")
8fbeb52a598c ("tracing: Fix parse_synth_field() error handling")
8b6ddd10d678 ("Merge tag 'trace-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 797311bce5c2ac90b8d65e357603cfd410d36ebb Mon Sep 17 00:00:00 2001
From: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Date: Tue, 11 Jul 2023 23:16:07 +0900
Subject: [PATCH] tracing/probes: Fix to record 0-length data_loc in
fetch_store_string*() if fails
Fix to record 0-length data to data_loc in fetch_store_string*() if it fails
to get the string data.
Currently those expect that the data_loc is updated by store_trace_args() if
it returns the error code. However, that does not work correctly if the
argument is an array of strings. In that case, store_trace_args() only clears
the first entry of the array (which may have no error) and leaves other
entries. So it should be cleared by fetch_store_string*() itself.
Also, 'dyndata' and 'maxlen' in store_trace_args() should be updated
only if it is used (ret > 0 and argument is a dynamic data.)
Link: https://lore.kernel.org/all/168908496683.123124.4761206188794205601.stgit@d…
Fixes: 40b53b771806 ("tracing: probeevent: Add array type support")
Cc: stable(a)vger.kernel.org
Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
diff --git a/kernel/trace/trace_probe_kernel.h b/kernel/trace/trace_probe_kernel.h
index 6deae2ce34f8..bb723eefd7b7 100644
--- a/kernel/trace/trace_probe_kernel.h
+++ b/kernel/trace/trace_probe_kernel.h
@@ -37,6 +37,13 @@ fetch_store_strlen(unsigned long addr)
return (ret < 0) ? ret : len;
}
+static nokprobe_inline void set_data_loc(int ret, void *dest, void *__dest, void *base)
+{
+ if (ret < 0)
+ ret = 0;
+ *(u32 *)dest = make_data_loc(ret, __dest - base);
+}
+
/*
* Fetch a null-terminated string from user. Caller MUST set *(u32 *)buf
* with max length and relative data location.
@@ -55,8 +62,7 @@ fetch_store_string_user(unsigned long addr, void *dest, void *base)
__dest = get_loc_data(dest, base);
ret = strncpy_from_user_nofault(__dest, uaddr, maxlen);
- if (ret >= 0)
- *(u32 *)dest = make_data_loc(ret, __dest - base);
+ set_data_loc(ret, dest, __dest, base);
return ret;
}
@@ -87,8 +93,7 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
* probing.
*/
ret = strncpy_from_kernel_nofault(__dest, (void *)addr, maxlen);
- if (ret >= 0)
- *(u32 *)dest = make_data_loc(ret, __dest - base);
+ set_data_loc(ret, dest, __dest, base);
return ret;
}
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 185da001f4c3..3935b347f874 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -267,13 +267,9 @@ store_trace_args(void *data, struct trace_probe *tp, void *rec,
if (unlikely(arg->dynamic))
*dl = make_data_loc(maxlen, dyndata - base);
ret = process_fetch_insn(arg->code, rec, dl, base);
- if (arg->dynamic) {
- if (unlikely(ret < 0)) {
- *dl = make_data_loc(0, dyndata - base);
- } else {
- dyndata += ret;
- maxlen -= ret;
- }
+ if (arg->dynamic && likely(ret > 0)) {
+ dyndata += ret;
+ maxlen -= ret;
}
}
}
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 8b92e34ff0c8..7b47e9a2c010 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -170,7 +170,8 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
*/
ret++;
*(u32 *)dest = make_data_loc(ret, (void *)dst - base);
- }
+ } else
+ *(u32 *)dest = make_data_loc(0, (void *)dst - base);
return ret;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 797311bce5c2ac90b8d65e357603cfd410d36ebb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072105-drop-down-yearling-a66f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
797311bce5c2 ("tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails")
4ed8f337dee3 ("Revert "tracing: Add "(fault)" name injection to kernel probes"")
e38e2c6a9efc ("tracing/probes: Fix to update dynamic data counter if fetcharg uses it")
00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces")
f1d3cbfaafc1 ("tracing: Move duplicate code of trace_kprobe/eprobe.c into header")
7491e2c44278 ("tracing: Add a probe that attaches to trace events")
8565a45d0858 ("tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs")
007517a01995 ("tracing/probe: Change traceprobe_set_print_fmt() to take a type")
3b13911a2fd0 ("tracing: Synthetic event field_pos is an index not a boolean")
bc87cf0a08d4 ("trace: Add a generic function to read/write u64 values from tracefs")
d262271d0483 ("tracing/dynevent: Delegate parsing to create function")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 797311bce5c2ac90b8d65e357603cfd410d36ebb Mon Sep 17 00:00:00 2001
From: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Date: Tue, 11 Jul 2023 23:16:07 +0900
Subject: [PATCH] tracing/probes: Fix to record 0-length data_loc in
fetch_store_string*() if fails
Fix to record 0-length data to data_loc in fetch_store_string*() if it fails
to get the string data.
Currently those expect that the data_loc is updated by store_trace_args() if
it returns the error code. However, that does not work correctly if the
argument is an array of strings. In that case, store_trace_args() only clears
the first entry of the array (which may have no error) and leaves other
entries. So it should be cleared by fetch_store_string*() itself.
Also, 'dyndata' and 'maxlen' in store_trace_args() should be updated
only if it is used (ret > 0 and argument is a dynamic data.)
Link: https://lore.kernel.org/all/168908496683.123124.4761206188794205601.stgit@d…
Fixes: 40b53b771806 ("tracing: probeevent: Add array type support")
Cc: stable(a)vger.kernel.org
Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
diff --git a/kernel/trace/trace_probe_kernel.h b/kernel/trace/trace_probe_kernel.h
index 6deae2ce34f8..bb723eefd7b7 100644
--- a/kernel/trace/trace_probe_kernel.h
+++ b/kernel/trace/trace_probe_kernel.h
@@ -37,6 +37,13 @@ fetch_store_strlen(unsigned long addr)
return (ret < 0) ? ret : len;
}
+static nokprobe_inline void set_data_loc(int ret, void *dest, void *__dest, void *base)
+{
+ if (ret < 0)
+ ret = 0;
+ *(u32 *)dest = make_data_loc(ret, __dest - base);
+}
+
/*
* Fetch a null-terminated string from user. Caller MUST set *(u32 *)buf
* with max length and relative data location.
@@ -55,8 +62,7 @@ fetch_store_string_user(unsigned long addr, void *dest, void *base)
__dest = get_loc_data(dest, base);
ret = strncpy_from_user_nofault(__dest, uaddr, maxlen);
- if (ret >= 0)
- *(u32 *)dest = make_data_loc(ret, __dest - base);
+ set_data_loc(ret, dest, __dest, base);
return ret;
}
@@ -87,8 +93,7 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
* probing.
*/
ret = strncpy_from_kernel_nofault(__dest, (void *)addr, maxlen);
- if (ret >= 0)
- *(u32 *)dest = make_data_loc(ret, __dest - base);
+ set_data_loc(ret, dest, __dest, base);
return ret;
}
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 185da001f4c3..3935b347f874 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -267,13 +267,9 @@ store_trace_args(void *data, struct trace_probe *tp, void *rec,
if (unlikely(arg->dynamic))
*dl = make_data_loc(maxlen, dyndata - base);
ret = process_fetch_insn(arg->code, rec, dl, base);
- if (arg->dynamic) {
- if (unlikely(ret < 0)) {
- *dl = make_data_loc(0, dyndata - base);
- } else {
- dyndata += ret;
- maxlen -= ret;
- }
+ if (arg->dynamic && likely(ret > 0)) {
+ dyndata += ret;
+ maxlen -= ret;
}
}
}
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 8b92e34ff0c8..7b47e9a2c010 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -170,7 +170,8 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
*/
ret++;
*(u32 *)dest = make_data_loc(ret, (void *)dst - base);
- }
+ } else
+ *(u32 *)dest = make_data_loc(0, (void *)dst - base);
return ret;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 797311bce5c2ac90b8d65e357603cfd410d36ebb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072104-epilepsy-remedial-bdd8@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
797311bce5c2 ("tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails")
4ed8f337dee3 ("Revert "tracing: Add "(fault)" name injection to kernel probes"")
e38e2c6a9efc ("tracing/probes: Fix to update dynamic data counter if fetcharg uses it")
00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces")
f1d3cbfaafc1 ("tracing: Move duplicate code of trace_kprobe/eprobe.c into header")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 797311bce5c2ac90b8d65e357603cfd410d36ebb Mon Sep 17 00:00:00 2001
From: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Date: Tue, 11 Jul 2023 23:16:07 +0900
Subject: [PATCH] tracing/probes: Fix to record 0-length data_loc in
fetch_store_string*() if fails
Fix to record 0-length data to data_loc in fetch_store_string*() if it fails
to get the string data.
Currently those expect that the data_loc is updated by store_trace_args() if
it returns the error code. However, that does not work correctly if the
argument is an array of strings. In that case, store_trace_args() only clears
the first entry of the array (which may have no error) and leaves other
entries. So it should be cleared by fetch_store_string*() itself.
Also, 'dyndata' and 'maxlen' in store_trace_args() should be updated
only if it is used (ret > 0 and argument is a dynamic data.)
Link: https://lore.kernel.org/all/168908496683.123124.4761206188794205601.stgit@d…
Fixes: 40b53b771806 ("tracing: probeevent: Add array type support")
Cc: stable(a)vger.kernel.org
Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
diff --git a/kernel/trace/trace_probe_kernel.h b/kernel/trace/trace_probe_kernel.h
index 6deae2ce34f8..bb723eefd7b7 100644
--- a/kernel/trace/trace_probe_kernel.h
+++ b/kernel/trace/trace_probe_kernel.h
@@ -37,6 +37,13 @@ fetch_store_strlen(unsigned long addr)
return (ret < 0) ? ret : len;
}
+static nokprobe_inline void set_data_loc(int ret, void *dest, void *__dest, void *base)
+{
+ if (ret < 0)
+ ret = 0;
+ *(u32 *)dest = make_data_loc(ret, __dest - base);
+}
+
/*
* Fetch a null-terminated string from user. Caller MUST set *(u32 *)buf
* with max length and relative data location.
@@ -55,8 +62,7 @@ fetch_store_string_user(unsigned long addr, void *dest, void *base)
__dest = get_loc_data(dest, base);
ret = strncpy_from_user_nofault(__dest, uaddr, maxlen);
- if (ret >= 0)
- *(u32 *)dest = make_data_loc(ret, __dest - base);
+ set_data_loc(ret, dest, __dest, base);
return ret;
}
@@ -87,8 +93,7 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
* probing.
*/
ret = strncpy_from_kernel_nofault(__dest, (void *)addr, maxlen);
- if (ret >= 0)
- *(u32 *)dest = make_data_loc(ret, __dest - base);
+ set_data_loc(ret, dest, __dest, base);
return ret;
}
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 185da001f4c3..3935b347f874 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -267,13 +267,9 @@ store_trace_args(void *data, struct trace_probe *tp, void *rec,
if (unlikely(arg->dynamic))
*dl = make_data_loc(maxlen, dyndata - base);
ret = process_fetch_insn(arg->code, rec, dl, base);
- if (arg->dynamic) {
- if (unlikely(ret < 0)) {
- *dl = make_data_loc(0, dyndata - base);
- } else {
- dyndata += ret;
- maxlen -= ret;
- }
+ if (arg->dynamic && likely(ret > 0)) {
+ dyndata += ret;
+ maxlen -= ret;
}
}
}
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 8b92e34ff0c8..7b47e9a2c010 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -170,7 +170,8 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
*/
ret++;
*(u32 *)dest = make_data_loc(ret, (void *)dst - base);
- }
+ } else
+ *(u32 *)dest = make_data_loc(0, (void *)dst - base);
return ret;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 797311bce5c2ac90b8d65e357603cfd410d36ebb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072103-huff-flyable-0350@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
797311bce5c2 ("tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails")
4ed8f337dee3 ("Revert "tracing: Add "(fault)" name injection to kernel probes"")
e38e2c6a9efc ("tracing/probes: Fix to update dynamic data counter if fetcharg uses it")
00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 797311bce5c2ac90b8d65e357603cfd410d36ebb Mon Sep 17 00:00:00 2001
From: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Date: Tue, 11 Jul 2023 23:16:07 +0900
Subject: [PATCH] tracing/probes: Fix to record 0-length data_loc in
fetch_store_string*() if fails
Fix to record 0-length data to data_loc in fetch_store_string*() if it fails
to get the string data.
Currently those expect that the data_loc is updated by store_trace_args() if
it returns the error code. However, that does not work correctly if the
argument is an array of strings. In that case, store_trace_args() only clears
the first entry of the array (which may have no error) and leaves other
entries. So it should be cleared by fetch_store_string*() itself.
Also, 'dyndata' and 'maxlen' in store_trace_args() should be updated
only if it is used (ret > 0 and argument is a dynamic data.)
Link: https://lore.kernel.org/all/168908496683.123124.4761206188794205601.stgit@d…
Fixes: 40b53b771806 ("tracing: probeevent: Add array type support")
Cc: stable(a)vger.kernel.org
Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
diff --git a/kernel/trace/trace_probe_kernel.h b/kernel/trace/trace_probe_kernel.h
index 6deae2ce34f8..bb723eefd7b7 100644
--- a/kernel/trace/trace_probe_kernel.h
+++ b/kernel/trace/trace_probe_kernel.h
@@ -37,6 +37,13 @@ fetch_store_strlen(unsigned long addr)
return (ret < 0) ? ret : len;
}
+static nokprobe_inline void set_data_loc(int ret, void *dest, void *__dest, void *base)
+{
+ if (ret < 0)
+ ret = 0;
+ *(u32 *)dest = make_data_loc(ret, __dest - base);
+}
+
/*
* Fetch a null-terminated string from user. Caller MUST set *(u32 *)buf
* with max length and relative data location.
@@ -55,8 +62,7 @@ fetch_store_string_user(unsigned long addr, void *dest, void *base)
__dest = get_loc_data(dest, base);
ret = strncpy_from_user_nofault(__dest, uaddr, maxlen);
- if (ret >= 0)
- *(u32 *)dest = make_data_loc(ret, __dest - base);
+ set_data_loc(ret, dest, __dest, base);
return ret;
}
@@ -87,8 +93,7 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
* probing.
*/
ret = strncpy_from_kernel_nofault(__dest, (void *)addr, maxlen);
- if (ret >= 0)
- *(u32 *)dest = make_data_loc(ret, __dest - base);
+ set_data_loc(ret, dest, __dest, base);
return ret;
}
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 185da001f4c3..3935b347f874 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -267,13 +267,9 @@ store_trace_args(void *data, struct trace_probe *tp, void *rec,
if (unlikely(arg->dynamic))
*dl = make_data_loc(maxlen, dyndata - base);
ret = process_fetch_insn(arg->code, rec, dl, base);
- if (arg->dynamic) {
- if (unlikely(ret < 0)) {
- *dl = make_data_loc(0, dyndata - base);
- } else {
- dyndata += ret;
- maxlen -= ret;
- }
+ if (arg->dynamic && likely(ret > 0)) {
+ dyndata += ret;
+ maxlen -= ret;
}
}
}
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 8b92e34ff0c8..7b47e9a2c010 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -170,7 +170,8 @@ fetch_store_string(unsigned long addr, void *dest, void *base)
*/
ret++;
*(u32 *)dest = make_data_loc(ret, (void *)dst - base);
- }
+ } else
+ *(u32 *)dest = make_data_loc(0, (void *)dst - base);
return ret;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x e38e2c6a9efc435f9de344b7c91f7697e01b47d5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072114-brownnose-browbeat-6258@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
e38e2c6a9efc ("tracing/probes: Fix to update dynamic data counter if fetcharg uses it")
8565a45d0858 ("tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e38e2c6a9efc435f9de344b7c91f7697e01b47d5 Mon Sep 17 00:00:00 2001
From: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Date: Tue, 11 Jul 2023 23:15:48 +0900
Subject: [PATCH] tracing/probes: Fix to update dynamic data counter if
fetcharg uses it
Fix to update dynamic data counter ('dyndata') and max length ('maxlen')
only if the fetcharg uses the dynamic data. Also get out arg->dynamic
from unlikely(). This makes dynamic data address wrong if
process_fetch_insn() returns error on !arg->dynamic case.
Link: https://lore.kernel.org/all/168908494781.123124.8160245359962103684.stgit@d…
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Link: https://lore.kernel.org/all/20230710233400.5aaf024e@gandalf.local.home/
Fixes: 9178412ddf5a ("tracing: probeevent: Return consumed bytes of dynamic area")
Cc: stable(a)vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index ed9d57c6b041..185da001f4c3 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -267,11 +267,13 @@ store_trace_args(void *data, struct trace_probe *tp, void *rec,
if (unlikely(arg->dynamic))
*dl = make_data_loc(maxlen, dyndata - base);
ret = process_fetch_insn(arg->code, rec, dl, base);
- if (unlikely(ret < 0 && arg->dynamic)) {
- *dl = make_data_loc(0, dyndata - base);
- } else {
- dyndata += ret;
- maxlen -= ret;
+ if (arg->dynamic) {
+ if (unlikely(ret < 0)) {
+ *dl = make_data_loc(0, dyndata - base);
+ } else {
+ dyndata += ret;
+ maxlen -= ret;
+ }
}
}
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x e38e2c6a9efc435f9de344b7c91f7697e01b47d5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072113-repave-bootleg-b466@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
e38e2c6a9efc ("tracing/probes: Fix to update dynamic data counter if fetcharg uses it")
8565a45d0858 ("tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e38e2c6a9efc435f9de344b7c91f7697e01b47d5 Mon Sep 17 00:00:00 2001
From: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Date: Tue, 11 Jul 2023 23:15:48 +0900
Subject: [PATCH] tracing/probes: Fix to update dynamic data counter if
fetcharg uses it
Fix to update dynamic data counter ('dyndata') and max length ('maxlen')
only if the fetcharg uses the dynamic data. Also get out arg->dynamic
from unlikely(). This makes dynamic data address wrong if
process_fetch_insn() returns error on !arg->dynamic case.
Link: https://lore.kernel.org/all/168908494781.123124.8160245359962103684.stgit@d…
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Link: https://lore.kernel.org/all/20230710233400.5aaf024e@gandalf.local.home/
Fixes: 9178412ddf5a ("tracing: probeevent: Return consumed bytes of dynamic area")
Cc: stable(a)vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index ed9d57c6b041..185da001f4c3 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -267,11 +267,13 @@ store_trace_args(void *data, struct trace_probe *tp, void *rec,
if (unlikely(arg->dynamic))
*dl = make_data_loc(maxlen, dyndata - base);
ret = process_fetch_insn(arg->code, rec, dl, base);
- if (unlikely(ret < 0 && arg->dynamic)) {
- *dl = make_data_loc(0, dyndata - base);
- } else {
- dyndata += ret;
- maxlen -= ret;
+ if (arg->dynamic) {
+ if (unlikely(ret < 0)) {
+ *dl = make_data_loc(0, dyndata - base);
+ } else {
+ dyndata += ret;
+ maxlen -= ret;
+ }
}
}
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 66bcf65d6cf0ca6540e2341e88ee7ef02dbdda08
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072141-stride-endorphin-a57a@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
66bcf65d6cf0 ("tracing/probes: Fix to avoid double count of the string length on the array")
b26a124cbfa8 ("tracing/probes: Add symstr type for dynamic events")
7491e2c44278 ("tracing: Add a probe that attaches to trace events")
007517a01995 ("tracing/probe: Change traceprobe_set_print_fmt() to take a type")
fcd9db51df8e ("tracing/probe: Have traceprobe_parse_probe_arg() take a const arg")
bc87cf0a08d4 ("trace: Add a generic function to read/write u64 values from tracefs")
d262271d0483 ("tracing/dynevent: Delegate parsing to create function")
d4d704637d93 ("tracing: Add synthetic event error logging")
9bbb33291f8e ("tracing: Check that the synthetic event and field names are legal")
42d120e2dda5 ("tracing: Move is_good_name() from trace_probe.h to trace.h")
bd82631d7ccd ("tracing: Add support for dynamic strings to synthetic events")
8fbeb52a598c ("tracing: Fix parse_synth_field() error handling")
3aa8fdc37d16 ("tracing/probe: Fix memleak in fetch_op_data operations")
726721a51838 ("tracing: Move synthetic events to a separate file")
1b94b3aed367 ("tracing: Check state.disabled in synth event trace functions")
91ad64a84e9e ("Merge tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 66bcf65d6cf0ca6540e2341e88ee7ef02dbdda08 Mon Sep 17 00:00:00 2001
From: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Date: Tue, 11 Jul 2023 23:15:29 +0900
Subject: [PATCH] tracing/probes: Fix to avoid double count of the string
length on the array
If an array is specified with the ustring or symstr, the length of the
strings are accumlated on both of 'ret' and 'total', which means the
length is double counted.
Just set the length to the 'ret' value for avoiding double counting.
Link: https://lore.kernel.org/all/168908492917.123124.15076463491122036025.stgit@…
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Closes: https://lore.kernel.org/all/8819b154-2ba1-43c3-98a2-cbde20892023@moroto.mou…
Fixes: 88903c464321 ("tracing/probe: Add ustring type for user-space string")
Cc: stable(a)vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 00707630788d..4735c5cb76fa 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -156,11 +156,11 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
code++;
goto array;
case FETCH_OP_ST_USTRING:
- ret += fetch_store_strlen_user(val + code->offset);
+ ret = fetch_store_strlen_user(val + code->offset);
code++;
goto array;
case FETCH_OP_ST_SYMSTR:
- ret += fetch_store_symstrlen(val + code->offset);
+ ret = fetch_store_symstrlen(val + code->offset);
code++;
goto array;
default:
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 66bcf65d6cf0ca6540e2341e88ee7ef02dbdda08
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072140-dropout-strep-a13c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
66bcf65d6cf0 ("tracing/probes: Fix to avoid double count of the string length on the array")
b26a124cbfa8 ("tracing/probes: Add symstr type for dynamic events")
7491e2c44278 ("tracing: Add a probe that attaches to trace events")
007517a01995 ("tracing/probe: Change traceprobe_set_print_fmt() to take a type")
fcd9db51df8e ("tracing/probe: Have traceprobe_parse_probe_arg() take a const arg")
bc87cf0a08d4 ("trace: Add a generic function to read/write u64 values from tracefs")
d262271d0483 ("tracing/dynevent: Delegate parsing to create function")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 66bcf65d6cf0ca6540e2341e88ee7ef02dbdda08 Mon Sep 17 00:00:00 2001
From: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Date: Tue, 11 Jul 2023 23:15:29 +0900
Subject: [PATCH] tracing/probes: Fix to avoid double count of the string
length on the array
If an array is specified with the ustring or symstr, the length of the
strings are accumlated on both of 'ret' and 'total', which means the
length is double counted.
Just set the length to the 'ret' value for avoiding double counting.
Link: https://lore.kernel.org/all/168908492917.123124.15076463491122036025.stgit@…
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Closes: https://lore.kernel.org/all/8819b154-2ba1-43c3-98a2-cbde20892023@moroto.mou…
Fixes: 88903c464321 ("tracing/probe: Add ustring type for user-space string")
Cc: stable(a)vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 00707630788d..4735c5cb76fa 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -156,11 +156,11 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
code++;
goto array;
case FETCH_OP_ST_USTRING:
- ret += fetch_store_strlen_user(val + code->offset);
+ ret = fetch_store_strlen_user(val + code->offset);
code++;
goto array;
case FETCH_OP_ST_SYMSTR:
- ret += fetch_store_symstrlen(val + code->offset);
+ ret = fetch_store_symstrlen(val + code->offset);
code++;
goto array;
default:
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 66bcf65d6cf0ca6540e2341e88ee7ef02dbdda08
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072139-those-ditch-4de2@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
66bcf65d6cf0 ("tracing/probes: Fix to avoid double count of the string length on the array")
b26a124cbfa8 ("tracing/probes: Add symstr type for dynamic events")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 66bcf65d6cf0ca6540e2341e88ee7ef02dbdda08 Mon Sep 17 00:00:00 2001
From: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Date: Tue, 11 Jul 2023 23:15:29 +0900
Subject: [PATCH] tracing/probes: Fix to avoid double count of the string
length on the array
If an array is specified with the ustring or symstr, the length of the
strings are accumlated on both of 'ret' and 'total', which means the
length is double counted.
Just set the length to the 'ret' value for avoiding double counting.
Link: https://lore.kernel.org/all/168908492917.123124.15076463491122036025.stgit@…
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Closes: https://lore.kernel.org/all/8819b154-2ba1-43c3-98a2-cbde20892023@moroto.mou…
Fixes: 88903c464321 ("tracing/probe: Add ustring type for user-space string")
Cc: stable(a)vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 00707630788d..4735c5cb76fa 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -156,11 +156,11 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
code++;
goto array;
case FETCH_OP_ST_USTRING:
- ret += fetch_store_strlen_user(val + code->offset);
+ ret = fetch_store_strlen_user(val + code->offset);
code++;
goto array;
case FETCH_OP_ST_SYMSTR:
- ret += fetch_store_symstrlen(val + code->offset);
+ ret = fetch_store_symstrlen(val + code->offset);
code++;
goto array;
default:
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 66bcf65d6cf0ca6540e2341e88ee7ef02dbdda08
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072138-nervous-shorten-9868@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
66bcf65d6cf0 ("tracing/probes: Fix to avoid double count of the string length on the array")
b26a124cbfa8 ("tracing/probes: Add symstr type for dynamic events")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 66bcf65d6cf0ca6540e2341e88ee7ef02dbdda08 Mon Sep 17 00:00:00 2001
From: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Date: Tue, 11 Jul 2023 23:15:29 +0900
Subject: [PATCH] tracing/probes: Fix to avoid double count of the string
length on the array
If an array is specified with the ustring or symstr, the length of the
strings are accumlated on both of 'ret' and 'total', which means the
length is double counted.
Just set the length to the 'ret' value for avoiding double counting.
Link: https://lore.kernel.org/all/168908492917.123124.15076463491122036025.stgit@…
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Closes: https://lore.kernel.org/all/8819b154-2ba1-43c3-98a2-cbde20892023@moroto.mou…
Fixes: 88903c464321 ("tracing/probe: Add ustring type for user-space string")
Cc: stable(a)vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 00707630788d..4735c5cb76fa 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -156,11 +156,11 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
code++;
goto array;
case FETCH_OP_ST_USTRING:
- ret += fetch_store_strlen_user(val + code->offset);
+ ret = fetch_store_strlen_user(val + code->offset);
code++;
goto array;
case FETCH_OP_ST_SYMSTR:
- ret += fetch_store_symstrlen(val + code->offset);
+ ret = fetch_store_symstrlen(val + code->offset);
code++;
goto array;
default:
Hi,
The upstream commit below is backported to 5.10.186, 5.15.120 and 6.1.36:
"""
commit ecdf985d7615356b78241fdb159c091830ed0380
Author: Eduard Zingerman <eddyz87(a)gmail.com>
Date: Wed Feb 15 01:20:27 2023 +0200
bpf: track immediate values written to stack by BPF_ST instruction
"""
This commit is causing the following bpf:test_verifier kselftest to fail:
"""
# #760/p precise: ST insn causing spi > allocated_stack FAIL
"""
Since this test didn't fail before ecdf985d76 backport, the question is
if this is a test bug or if this commit introduced a regression.
I haven't checked if this failure is present in latest Linus tree because
I was unable to build & run the bpf kselftests in an older distro.
Also, there some important details about running the bpf kselftests
in 5.10 and 5.15:
* On 5.10, bpf kselftest build is broken. The following upstream
commit needs to be cherry-picked for it to build & run:
"""
commit 4237e9f4a96228ccc8a7abe5e4b30834323cd353
Author: Gilad Reti <gilad.reti(a)gmail.com>
Date: Wed Jan 13 07:38:08 2021 +0200
selftests/bpf: Add verifier test for PTR_TO_MEM spill
"""
* On 5.15.120 there's one additional test that's failing, but I didn't
debug this one:
"""
#150/p calls: trigger reg2btf_ids[reg→type] for reg→type > __BPF_REG_TYPE_MAX FAIL
FAIL
"""
* On 5.11 onwards, building and running bpf tests is disabled by
default by commit 7a6eb7c34a78498742b5f82543b7a68c1c443329, so I wonder
if we want to backport this to 5.10 as well?
Thanks!
- Luiz
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 87a2cbf02d7701255f9fcca7e5bd864a7bb397cf
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072133-plank-glorified-2d3f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
87a2cbf02d77 ("pwm: meson: fix handling of period/duty if greater than UINT_MAX")
5f97f18feac9 ("pwm: meson: Simplify duplicated per-channel tracking")
437fb760d046 ("pwm: meson: Remove redundant assignment to variable fin_freq")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 87a2cbf02d7701255f9fcca7e5bd864a7bb397cf Mon Sep 17 00:00:00 2001
From: Heiner Kallweit <hkallweit1(a)gmail.com>
Date: Wed, 24 May 2023 21:48:36 +0200
Subject: [PATCH] pwm: meson: fix handling of period/duty if greater than
UINT_MAX
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
state->period/duty are of type u64, and if their value is greater than
UINT_MAX, then the cast to uint will cause problems. Fix this by
changing the type of the respective local variables to u64.
Fixes: b79c3670e120 ("pwm: meson: Don't duplicate the polarity internally")
Cc: stable(a)vger.kernel.org
Suggested-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
Signed-off-by: Heiner Kallweit <hkallweit1(a)gmail.com>
Signed-off-by: Thierry Reding <thierry.reding(a)gmail.com>
diff --git a/drivers/pwm/pwm-meson.c b/drivers/pwm/pwm-meson.c
index 3865538dd2d6..33107204a951 100644
--- a/drivers/pwm/pwm-meson.c
+++ b/drivers/pwm/pwm-meson.c
@@ -156,8 +156,9 @@ static int meson_pwm_calc(struct meson_pwm *meson, struct pwm_device *pwm,
const struct pwm_state *state)
{
struct meson_pwm_channel *channel = &meson->channels[pwm->hwpwm];
- unsigned int duty, period, pre_div, cnt, duty_cnt;
+ unsigned int pre_div, cnt, duty_cnt;
unsigned long fin_freq;
+ u64 duty, period;
duty = state->duty_cycle;
period = state->period;
@@ -179,19 +180,19 @@ static int meson_pwm_calc(struct meson_pwm *meson, struct pwm_device *pwm,
dev_dbg(meson->chip.dev, "fin_freq: %lu Hz\n", fin_freq);
- pre_div = div64_u64(fin_freq * (u64)period, NSEC_PER_SEC * 0xffffLL);
+ pre_div = div64_u64(fin_freq * period, NSEC_PER_SEC * 0xffffLL);
if (pre_div > MISC_CLK_DIV_MASK) {
dev_err(meson->chip.dev, "unable to get period pre_div\n");
return -EINVAL;
}
- cnt = div64_u64(fin_freq * (u64)period, NSEC_PER_SEC * (pre_div + 1));
+ cnt = div64_u64(fin_freq * period, NSEC_PER_SEC * (pre_div + 1));
if (cnt > 0xffff) {
dev_err(meson->chip.dev, "unable to get period cnt\n");
return -EINVAL;
}
- dev_dbg(meson->chip.dev, "period=%u pre_div=%u cnt=%u\n", period,
+ dev_dbg(meson->chip.dev, "period=%llu pre_div=%u cnt=%u\n", period,
pre_div, cnt);
if (duty == period) {
@@ -204,14 +205,13 @@ static int meson_pwm_calc(struct meson_pwm *meson, struct pwm_device *pwm,
channel->lo = cnt;
} else {
/* Then check is we can have the duty with the same pre_div */
- duty_cnt = div64_u64(fin_freq * (u64)duty,
- NSEC_PER_SEC * (pre_div + 1));
+ duty_cnt = div64_u64(fin_freq * duty, NSEC_PER_SEC * (pre_div + 1));
if (duty_cnt > 0xffff) {
dev_err(meson->chip.dev, "unable to get duty cycle\n");
return -EINVAL;
}
- dev_dbg(meson->chip.dev, "duty=%u pre_div=%u duty_cnt=%u\n",
+ dev_dbg(meson->chip.dev, "duty=%llu pre_div=%u duty_cnt=%u\n",
duty, pre_div, duty_cnt);
channel->pre_div = pre_div;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 87a2cbf02d7701255f9fcca7e5bd864a7bb397cf
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072131-widely-ploy-5798@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
87a2cbf02d77 ("pwm: meson: fix handling of period/duty if greater than UINT_MAX")
5f97f18feac9 ("pwm: meson: Simplify duplicated per-channel tracking")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 87a2cbf02d7701255f9fcca7e5bd864a7bb397cf Mon Sep 17 00:00:00 2001
From: Heiner Kallweit <hkallweit1(a)gmail.com>
Date: Wed, 24 May 2023 21:48:36 +0200
Subject: [PATCH] pwm: meson: fix handling of period/duty if greater than
UINT_MAX
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
state->period/duty are of type u64, and if their value is greater than
UINT_MAX, then the cast to uint will cause problems. Fix this by
changing the type of the respective local variables to u64.
Fixes: b79c3670e120 ("pwm: meson: Don't duplicate the polarity internally")
Cc: stable(a)vger.kernel.org
Suggested-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
Signed-off-by: Heiner Kallweit <hkallweit1(a)gmail.com>
Signed-off-by: Thierry Reding <thierry.reding(a)gmail.com>
diff --git a/drivers/pwm/pwm-meson.c b/drivers/pwm/pwm-meson.c
index 3865538dd2d6..33107204a951 100644
--- a/drivers/pwm/pwm-meson.c
+++ b/drivers/pwm/pwm-meson.c
@@ -156,8 +156,9 @@ static int meson_pwm_calc(struct meson_pwm *meson, struct pwm_device *pwm,
const struct pwm_state *state)
{
struct meson_pwm_channel *channel = &meson->channels[pwm->hwpwm];
- unsigned int duty, period, pre_div, cnt, duty_cnt;
+ unsigned int pre_div, cnt, duty_cnt;
unsigned long fin_freq;
+ u64 duty, period;
duty = state->duty_cycle;
period = state->period;
@@ -179,19 +180,19 @@ static int meson_pwm_calc(struct meson_pwm *meson, struct pwm_device *pwm,
dev_dbg(meson->chip.dev, "fin_freq: %lu Hz\n", fin_freq);
- pre_div = div64_u64(fin_freq * (u64)period, NSEC_PER_SEC * 0xffffLL);
+ pre_div = div64_u64(fin_freq * period, NSEC_PER_SEC * 0xffffLL);
if (pre_div > MISC_CLK_DIV_MASK) {
dev_err(meson->chip.dev, "unable to get period pre_div\n");
return -EINVAL;
}
- cnt = div64_u64(fin_freq * (u64)period, NSEC_PER_SEC * (pre_div + 1));
+ cnt = div64_u64(fin_freq * period, NSEC_PER_SEC * (pre_div + 1));
if (cnt > 0xffff) {
dev_err(meson->chip.dev, "unable to get period cnt\n");
return -EINVAL;
}
- dev_dbg(meson->chip.dev, "period=%u pre_div=%u cnt=%u\n", period,
+ dev_dbg(meson->chip.dev, "period=%llu pre_div=%u cnt=%u\n", period,
pre_div, cnt);
if (duty == period) {
@@ -204,14 +205,13 @@ static int meson_pwm_calc(struct meson_pwm *meson, struct pwm_device *pwm,
channel->lo = cnt;
} else {
/* Then check is we can have the duty with the same pre_div */
- duty_cnt = div64_u64(fin_freq * (u64)duty,
- NSEC_PER_SEC * (pre_div + 1));
+ duty_cnt = div64_u64(fin_freq * duty, NSEC_PER_SEC * (pre_div + 1));
if (duty_cnt > 0xffff) {
dev_err(meson->chip.dev, "unable to get duty cycle\n");
return -EINVAL;
}
- dev_dbg(meson->chip.dev, "duty=%u pre_div=%u duty_cnt=%u\n",
+ dev_dbg(meson->chip.dev, "duty=%llu pre_div=%u duty_cnt=%u\n",
duty, pre_div, duty_cnt);
channel->pre_div = pre_div;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 87a2cbf02d7701255f9fcca7e5bd864a7bb397cf
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072132-basket-kinetic-8fdf@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
87a2cbf02d77 ("pwm: meson: fix handling of period/duty if greater than UINT_MAX")
5f97f18feac9 ("pwm: meson: Simplify duplicated per-channel tracking")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 87a2cbf02d7701255f9fcca7e5bd864a7bb397cf Mon Sep 17 00:00:00 2001
From: Heiner Kallweit <hkallweit1(a)gmail.com>
Date: Wed, 24 May 2023 21:48:36 +0200
Subject: [PATCH] pwm: meson: fix handling of period/duty if greater than
UINT_MAX
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
state->period/duty are of type u64, and if their value is greater than
UINT_MAX, then the cast to uint will cause problems. Fix this by
changing the type of the respective local variables to u64.
Fixes: b79c3670e120 ("pwm: meson: Don't duplicate the polarity internally")
Cc: stable(a)vger.kernel.org
Suggested-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
Signed-off-by: Heiner Kallweit <hkallweit1(a)gmail.com>
Signed-off-by: Thierry Reding <thierry.reding(a)gmail.com>
diff --git a/drivers/pwm/pwm-meson.c b/drivers/pwm/pwm-meson.c
index 3865538dd2d6..33107204a951 100644
--- a/drivers/pwm/pwm-meson.c
+++ b/drivers/pwm/pwm-meson.c
@@ -156,8 +156,9 @@ static int meson_pwm_calc(struct meson_pwm *meson, struct pwm_device *pwm,
const struct pwm_state *state)
{
struct meson_pwm_channel *channel = &meson->channels[pwm->hwpwm];
- unsigned int duty, period, pre_div, cnt, duty_cnt;
+ unsigned int pre_div, cnt, duty_cnt;
unsigned long fin_freq;
+ u64 duty, period;
duty = state->duty_cycle;
period = state->period;
@@ -179,19 +180,19 @@ static int meson_pwm_calc(struct meson_pwm *meson, struct pwm_device *pwm,
dev_dbg(meson->chip.dev, "fin_freq: %lu Hz\n", fin_freq);
- pre_div = div64_u64(fin_freq * (u64)period, NSEC_PER_SEC * 0xffffLL);
+ pre_div = div64_u64(fin_freq * period, NSEC_PER_SEC * 0xffffLL);
if (pre_div > MISC_CLK_DIV_MASK) {
dev_err(meson->chip.dev, "unable to get period pre_div\n");
return -EINVAL;
}
- cnt = div64_u64(fin_freq * (u64)period, NSEC_PER_SEC * (pre_div + 1));
+ cnt = div64_u64(fin_freq * period, NSEC_PER_SEC * (pre_div + 1));
if (cnt > 0xffff) {
dev_err(meson->chip.dev, "unable to get period cnt\n");
return -EINVAL;
}
- dev_dbg(meson->chip.dev, "period=%u pre_div=%u cnt=%u\n", period,
+ dev_dbg(meson->chip.dev, "period=%llu pre_div=%u cnt=%u\n", period,
pre_div, cnt);
if (duty == period) {
@@ -204,14 +205,13 @@ static int meson_pwm_calc(struct meson_pwm *meson, struct pwm_device *pwm,
channel->lo = cnt;
} else {
/* Then check is we can have the duty with the same pre_div */
- duty_cnt = div64_u64(fin_freq * (u64)duty,
- NSEC_PER_SEC * (pre_div + 1));
+ duty_cnt = div64_u64(fin_freq * duty, NSEC_PER_SEC * (pre_div + 1));
if (duty_cnt > 0xffff) {
dev_err(meson->chip.dev, "unable to get duty cycle\n");
return -EINVAL;
}
- dev_dbg(meson->chip.dev, "duty=%u pre_div=%u duty_cnt=%u\n",
+ dev_dbg(meson->chip.dev, "duty=%llu pre_div=%u duty_cnt=%u\n",
duty, pre_div, duty_cnt);
channel->pre_div = pre_div;