This is a note to let you know that I've just added the patch titled
xhci: Free the command allocated for setting LPM if we return early
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From f6caea4855553a8b99ba3ec23ecdb5ed8262f26c Mon Sep 17 00:00:00 2001
From: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Date: Thu, 30 Mar 2023 17:30:56 +0300
Subject: xhci: Free the command allocated for setting LPM if we return early
The command allocated to set exit latency LPM values need to be freed in
case the command is never queued. This would be the case if there is no
change in exit latency values, or device is missing.
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
Link: https://lore.kernel.org/linux-usb/24263902-c9b3-ce29-237b-1c3d6918f4fe@alu.…
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
Fixes: 5c2a380a5aa8 ("xhci: Allocate separate command structures for each LPM command")
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Link: https://lore.kernel.org/r/20230330143056.1390020-4-mathias.nyman@linux.inte…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/xhci.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index bdb6dd819a3b..6307bae9cddf 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -4442,6 +4442,7 @@ static int __maybe_unused xhci_change_max_exit_latency(struct xhci_hcd *xhci,
if (!virt_dev || max_exit_latency == virt_dev->current_mel) {
spin_unlock_irqrestore(&xhci->lock, flags);
+ xhci_free_command(xhci, command);
return 0;
}
--
2.40.0
The RX macro codec comes on some platforms in two variants - ADSP
and ADSP bypassed - thus the clock-names varies from 3 to 5. The clocks
must vary as well:
sc7280-idp.dtb: codec@3200000: clocks: [[202, 8], [202, 7], [203]] is too short
Fixes: 852fda58d99a ("ASoC: qcom: dt-bindings: Update bindings for clocks in lpass digital codes")
Cc: <stable(a)vger.kernel.org>
Acked-by: Rob Herring <robh(a)kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
Documentation/devicetree/bindings/sound/qcom,lpass-rx-macro.yaml | 1 +
1 file changed, 1 insertion(+)
diff --git a/Documentation/devicetree/bindings/sound/qcom,lpass-rx-macro.yaml b/Documentation/devicetree/bindings/sound/qcom,lpass-rx-macro.yaml
index f8972769cc6a..ec4b0ac8ad68 100644
--- a/Documentation/devicetree/bindings/sound/qcom,lpass-rx-macro.yaml
+++ b/Documentation/devicetree/bindings/sound/qcom,lpass-rx-macro.yaml
@@ -28,6 +28,7 @@ properties:
const: 0
clocks:
+ minItems: 3
maxItems: 5
clock-names:
--
2.34.1
I'm announcing the release of the 5.4.239 kernel.
It just fixes up a permission of one file,
tools/testing/selftests/net/fib_tests.sh, if you don't have an issue with this
specific testing file, no need to upgrade.
The updated 5.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Greg Kroah-Hartman (1):
Linux 5.4.239
Rishabh Bhatnagar (1):
selftests: Fix the executable permissions for fib_tests.sh
When upreving llvm I realised that kexec stopped working on my test
platform. This patch fixes it.
To: Eric Biederman <ebiederm(a)xmission.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Philipp Rudo <prudo(a)redhat.com>
Cc: kexec(a)lists.infradead.org
Cc: linux-kernel(a)vger.kernel.org
Cc: Ross Zwisler <zwisler(a)google.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Changes in v4:
- Add Cc: stable
- Add linker script for x86
- Add a warning when the kernel image has overlapping sections.
- Link to v3: https://lore.kernel.org/r/20230321-kexec_clang16-v3-0-5f016c8d0e87@chromium…
Changes in v3:
- Fix initial value. Thanks Ross!
- Link to v2: https://lore.kernel.org/r/20230321-kexec_clang16-v2-0-d10e5d517869@chromium…
Changes in v2:
- Fix if condition. Thanks Steven!.
- Update Philipp email. Thanks Baoquan.
- Link to v1: https://lore.kernel.org/r/20230321-kexec_clang16-v1-0-a768fc2c7c4d@chromium…
---
Ricardo Ribalda (2):
kexec: Support purgatories with .text.hot sections
x86/purgatory: Add linker script
arch/x86/purgatory/.gitignore | 2 ++
arch/x86/purgatory/Makefile | 20 +++++++++----
arch/x86/purgatory/kexec-purgatory.S | 2 +-
arch/x86/purgatory/purgatory.lds.S | 57 ++++++++++++++++++++++++++++++++++++
kernel/kexec_file.c | 13 +++++++-
5 files changed, 86 insertions(+), 8 deletions(-)
---
base-commit: 17214b70a159c6547df9ae204a6275d983146f6b
change-id: 20230321-kexec_clang16-4510c23d129c
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
The OSM/EPSS hardware controls the frequency of each CPU cluster based
on requests from the OS and various throttling events in the system.
While throttling is in effect the related dcvs interrupt will be kept
high. The purpose of the code handling this interrupt is to
continuously report the thermal pressure based on the throttled
frequency.
The reasoning for adding QoS control to this mechanism is not entirely
clear, but the introduction of commit 'c4c0efb06f17 ("cpufreq:
qcom-cpufreq-hw: Add cpufreq qos for LMh")' causes the
scaling_max_frequncy to be set to the throttled frequency. On the next
iteration of polling, the throttled frequency is above or equal to the
newly requested frequency, so the polling is stopped.
With cpufreq limiting the max frequency, the hardware no longer report a
throttling state and no further updates to thermal pressure or qos
state are made.
The result of this is that scaling_max_frequency can only go down, and
the system becomes slower and slower every time a thermal throttling
event is reported by the hardware.
Even if the logic could be improved, there is no reason for software to
limit the max freqency in response to the hardware limiting the max
frequency. At best software will follow the reported hardware state, but
typically it will cause slower backoff of the throttling.
This reverts commit c4c0efb06f17fa4a37ad99e7752b18a5405c76dc.
Fixes: c4c0efb06f17 ("cpufreq: qcom-cpufreq-hw: Add cpufreq qos for LMh")
Cc: stable(a)vger.kernel.org
Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Signed-off-by: Bjorn Andersson <quic_bjorande(a)quicinc.com>
---
drivers/cpufreq/qcom-cpufreq-hw.c | 14 --------------
1 file changed, 14 deletions(-)
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 575a4461c25a..1503d315fa7e 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -14,7 +14,6 @@
#include <linux/of_address.h>
#include <linux/of_platform.h>
#include <linux/pm_opp.h>
-#include <linux/pm_qos.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/units.h>
@@ -60,8 +59,6 @@ struct qcom_cpufreq_data {
struct clk_hw cpu_clk;
bool per_core_dcvs;
-
- struct freq_qos_request throttle_freq_req;
};
static struct {
@@ -351,8 +348,6 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
throttled_freq = freq_hz / HZ_PER_KHZ;
- freq_qos_update_request(&data->throttle_freq_req, throttled_freq);
-
/* Update thermal pressure (the boost frequencies are accepted) */
arch_update_thermal_pressure(policy->related_cpus, throttled_freq);
@@ -445,14 +440,6 @@ static int qcom_cpufreq_hw_lmh_init(struct cpufreq_policy *policy, int index)
if (data->throttle_irq < 0)
return data->throttle_irq;
- ret = freq_qos_add_request(&policy->constraints,
- &data->throttle_freq_req, FREQ_QOS_MAX,
- FREQ_QOS_MAX_DEFAULT_VALUE);
- if (ret < 0) {
- dev_err(&pdev->dev, "Failed to add freq constraint (%d)\n", ret);
- return ret;
- }
-
data->cancel_throttle = false;
data->policy = policy;
@@ -519,7 +506,6 @@ static void qcom_cpufreq_hw_lmh_exit(struct qcom_cpufreq_data *data)
if (data->throttle_irq <= 0)
return;
- freq_qos_remove_request(&data->throttle_freq_req);
free_irq(data->throttle_irq, data);
}
--
2.25.1
Device exclusive page table entries are used to prevent CPU access to
a page whilst it is being accessed from a device. Typically this is
used to implement atomic operations when the underlying bus does not
support atomic access. When a CPU thread encounters a device exclusive
entry it locks the page and restores the original entry after calling
mmu notifiers to signal drivers that exclusive access is no longer
available.
The device exclusive entry holds a reference to the page making it
safe to access the struct page whilst the entry is present. However
the fault handling code does not hold the PTL when taking the page
lock. This means if there are multiple threads faulting concurrently
on the device exclusive entry one will remove the entry whilst others
will wait on the page lock without holding a reference.
This can lead to threads locking or waiting on a page with a zero
refcount. Whilst mmap_lock prevents the pages getting freed via
munmap() they may still be freed by a migration. This leads to
warnings such as PAGE_FLAGS_CHECK_AT_FREE due to the page being locked
when the refcount drops to zero. Note that during removal of the
device exclusive entry the PTE is currently re-checked under the PTL
so no futher bad page accesses occur once it is locked.
Signed-off-by: Alistair Popple <apopple(a)nvidia.com>
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Cc: stable(a)vger.kernel.org
---
mm/memory.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/mm/memory.c b/mm/memory.c
index 8c8420934d60..b499bd283d8e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3623,8 +3623,19 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
struct vm_area_struct *vma = vmf->vma;
struct mmu_notifier_range range;
- if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags))
+ /*
+ * We need a page reference to lock the page because we don't
+ * hold the PTL so a racing thread can remove the
+ * device-exclusive entry and unmap the page. If the page is
+ * free the entry must have been removed already.
+ */
+ if (!get_page_unless_zero(vmf->page))
+ return 0;
+
+ if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) {
+ put_page(vmf->page);
return VM_FAULT_RETRY;
+ }
mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, vma,
vma->vm_mm, vmf->address & PAGE_MASK,
(vmf->address & PAGE_MASK) + PAGE_SIZE, NULL);
@@ -3637,6 +3648,7 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
pte_unmap_unlock(vmf->pte, vmf->ptl);
folio_unlock(folio);
+ put_page(vmf->page);
mmu_notifier_invalidate_range_end(&range);
return 0;
--
2.39.2
The quilt patch titled
Subject: mm: take a page reference when removing device exclusive entries
has been removed from the -mm tree. Its filename was
mm-take-a-page-reference-when-removing-device-exclusive-entries.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Alistair Popple <apopple(a)nvidia.com>
Subject: mm: take a page reference when removing device exclusive entries
Date: Tue, 28 Mar 2023 13:14:34 +1100
Device exclusive page table entries are used to prevent CPU access to a
page whilst it is being accessed from a device. Typically this is used to
implement atomic operations when the underlying bus does not support
atomic access. When a CPU thread encounters a device exclusive entry it
locks the page and restores the original entry after calling mmu notifiers
to signal drivers that exclusive access is no longer available.
The device exclusive entry holds a reference to the page making it safe to
access the struct page whilst the entry is present. However the fault
handling code does not hold the PTL when taking the page lock. This means
if there are multiple threads faulting concurrently on the device
exclusive entry one will remove the entry whilst others will wait on the
page lock without holding a reference.
This can lead to threads locking or waiting on a page with a zero
refcount. Whilst mmap_lock prevents the pages getting freed via munmap()
they may still be freed by a migration. This leads to warnings such as
PAGE_FLAGS_CHECK_AT_FREE due to the page being locked when the refcount
drops to zero. Note that during removal of the device exclusive entry the
PTE is currently re-checked under the PTL so no futher bad page accesses
occur once it is locked.
Link: https://lkml.kernel.org/r/20230328021434.292971-1-apopple@nvidia.com
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Signed-off-by: Alistair Popple <apopple(a)nvidia.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
--- a/mm/memory.c~mm-take-a-page-reference-when-removing-device-exclusive-entries
+++ a/mm/memory.c
@@ -3563,8 +3563,19 @@ static vm_fault_t remove_device_exclusiv
struct vm_area_struct *vma = vmf->vma;
struct mmu_notifier_range range;
- if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags))
+ /*
+ * We need a page reference to lock the page because we don't hold the
+ * PTL so a racing thread can remove the device-exclusive entry and
+ * unmap the page. If the page is free the entry must have been
+ * removed already.
+ */
+ if (!get_page_unless_zero(vmf->page))
+ return 0;
+
+ if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) {
+ put_page(vmf->page);
return VM_FAULT_RETRY;
+ }
mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0,
vma->vm_mm, vmf->address & PAGE_MASK,
(vmf->address & PAGE_MASK) + PAGE_SIZE, NULL);
@@ -3577,6 +3588,7 @@ static vm_fault_t remove_device_exclusiv
pte_unmap_unlock(vmf->pte, vmf->ptl);
folio_unlock(folio);
+ put_page(vmf->page);
mmu_notifier_invalidate_range_end(&range);
return 0;
_
Patches currently in -mm which might be from apopple(a)nvidia.com are
The patch titled
Subject: mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-khugepaged-fix-kernel-bug-in-hpage_collapse_scan_file.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ivan Orlov <ivan.orlov0322(a)gmail.com>
Subject: mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()
Date: Wed, 29 Mar 2023 18:53:30 +0400
Syzkaller reported the following issue:
kernel BUG at mm/khugepaged.c:1823!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 5097 Comm: syz-executor220 Not tainted 6.2.0-syzkaller-13154-g857f1268a591 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/16/2023
RIP: 0010:collapse_file mm/khugepaged.c:1823 [inline]
RIP: 0010:hpage_collapse_scan_file+0x67c8/0x7580 mm/khugepaged.c:2233
Code: 00 00 89 de e8 c9 66 a3 ff 31 ff 89 de e8 c0 66 a3 ff 45 84 f6 0f 85 28 0d 00 00 e8 22 64 a3 ff e9 dc f7 ff ff e8 18 64 a3 ff <0f> 0b f3 0f 1e fa e8 0d 64 a3 ff e9 93 f6 ff ff f3 0f 1e fa 4c 89
RSP: 0018:ffffc90003dff4e0 EFLAGS: 00010093
RAX: ffffffff81e95988 RBX: 00000000000001c1 RCX: ffff8880205b3a80
RDX: 0000000000000000 RSI: 00000000000001c0 RDI: 00000000000001c1
RBP: ffffc90003dff830 R08: ffffffff81e90e67 R09: fffffbfff1a433c3
R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000000
R13: ffffc90003dff6c0 R14: 00000000000001c0 R15: 0000000000000000
FS: 00007fdbae5ee700(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdbae6901e0 CR3: 000000007b2dd000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
madvise_collapse+0x721/0xf50 mm/khugepaged.c:2693
madvise_vma_behavior mm/madvise.c:1086 [inline]
madvise_walk_vmas mm/madvise.c:1260 [inline]
do_madvise+0x9e5/0x4680 mm/madvise.c:1439
__do_sys_madvise mm/madvise.c:1452 [inline]
__se_sys_madvise mm/madvise.c:1450 [inline]
__x64_sys_madvise+0xa5/0xb0 mm/madvise.c:1450
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The xas_store() call during page cache scanning can potentially translate
'xas' into the error state (with the reproducer provided by the syzkaller
the error code is -ENOMEM). However, there are no further checks after
the 'xas_store', and the next call of 'xas_next' at the start of the
scanning cycle doesn't increase the xa_index, and the issue occurs.
This patch will add the xarray state error checking after the xas_store()
and the corresponding result error code.
Tested via syzbot.
Link: https://lkml.kernel.org/r/20230329145330.23191-1-ivan.orlov0322@gmail.com
Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959…
Signed-off-by: Ivan Orlov <ivan.orlov0322(a)gmail.com>
Reported-by: syzbot+9578faa5475acb35fa50(a)syzkaller.appspotmail.com
Cc: Himadri Pandya <himadrispandya(a)gmail.com>
Cc: Ivan Orlov <ivan.orlov0322(a)gmail.com>
Cc: Shuah Khan <skhan(a)linuxfoundation.org>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 10 ++++++++++
1 file changed, 10 insertions(+)
--- a/mm/khugepaged.c~mm-khugepaged-fix-kernel-bug-in-hpage_collapse_scan_file
+++ a/mm/khugepaged.c
@@ -55,6 +55,7 @@ enum scan_result {
SCAN_CGROUP_CHARGE_FAIL,
SCAN_TRUNCATED,
SCAN_PAGE_HAS_PRIVATE,
+ SCAN_STORE_FAILED,
};
#define CREATE_TRACE_POINTS
@@ -1840,6 +1841,15 @@ static int collapse_file(struct mm_struc
goto xa_locked;
}
xas_store(&xas, hpage);
+ if (xas_error(&xas)) {
+ /* revert shmem_charge performed
+ * in the previous condition
+ */
+ mapping->nrpages--;
+ shmem_uncharge(mapping->host, 1);
+ result = SCAN_STORE_FAILED;
+ goto xa_locked;
+ }
nr_none++;
continue;
}
_
Patches currently in -mm which might be from ivan.orlov0322(a)gmail.com are
mm-khugepaged-fix-kernel-bug-in-hpage_collapse_scan_file.patch
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 97a71c444a147ae41c7d0ab5b3d855d7f762f3ed
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '167812333979118(a)kroah.com' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
97a71c444a14 ("KVM: x86: Purge "highest ISR" cache when updating APICv state")
ce0a58f4756c ("KVM: x86: Move "apicv_active" into "struct kvm_lapic"")
d39850f57d21 ("KVM: x86: Drop @vcpu parameter from kvm_x86_ops.hwapic_isr_update()")
47e8eec83262 ("Merge tag 'kvmarm-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 97a71c444a147ae41c7d0ab5b3d855d7f762f3ed Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 6 Jan 2023 01:12:35 +0000
Subject: [PATCH] KVM: x86: Purge "highest ISR" cache when updating APICv state
Purge the "highest ISR" cache when updating APICv state on a vCPU. The
cache must not be used when APICv is active as hardware may emulate EOIs
(and other operations) without exiting to KVM.
This fixes a bug where KVM will effectively block IRQs in perpetuity due
to the "highest ISR" never getting reset if APICv is activated on a vCPU
while an IRQ is in-service. Hardware emulates the EOI and KVM never gets
a chance to update its cache.
Fixes: b26a695a1d78 ("kvm: lapic: Introduce APICv update helper function")
Cc: stable(a)vger.kernel.org
Cc: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini(a)redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20230106011306.85230-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5c0f93fc073a..33a661d82da7 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2424,6 +2424,7 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
*/
apic->isr_count = count_vectors(apic->regs + APIC_ISR);
}
+ apic->highest_isr_cache = -1;
}
void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -2479,7 +2480,6 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
kvm_lapic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
}
kvm_apic_update_apicv(vcpu);
- apic->highest_isr_cache = -1;
update_divide_count(apic);
atomic_set(&apic->lapic_timer.pending, 0);
@@ -2767,7 +2767,6 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
__start_apic_timer(apic, APIC_TMCCT);
kvm_lapic_set_reg(apic, APIC_TMCCT, 0);
kvm_apic_update_apicv(vcpu);
- apic->highest_isr_cache = -1;
if (apic->apicv_active) {
static_call_cond(kvm_x86_apicv_post_state_restore)(vcpu);
static_call_cond(kvm_x86_hwapic_irr_update)(vcpu, apic_find_highest_irr(apic));
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x ab52be1b310bcb39e6745d34a8f0e8475d67381a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '167812345411383(a)kroah.com' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
ab52be1b310b ("KVM: x86: Inject #GP on x2APIC WRMSR that sets reserved bits 63:32")
a57a31684d7b ("KVM: x86: Treat x2APIC's ICR as a 64-bit register, not two 32-bit regs")
5429478d038f ("KVM: x86: Add helpers to handle 64-bit APIC MSR read/writes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ab52be1b310bcb39e6745d34a8f0e8475d67381a Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Sat, 7 Jan 2023 01:10:21 +0000
Subject: [PATCH] KVM: x86: Inject #GP on x2APIC WRMSR that sets reserved bits
63:32
Reject attempts to set bits 63:32 for 32-bit x2APIC registers, i.e. all
x2APIC registers except ICR. Per Intel's SDM:
Non-zero writes (by WRMSR instruction) to reserved bits to these
registers will raise a general protection fault exception
Opportunistically fix a typo in a nearby comment.
Reported-by: Marc Orr <marcorr(a)google.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Link: https://lore.kernel.org/r/20230107011025.565472-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9aca006b2d22..814b65106057 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -3114,13 +3114,17 @@ static int kvm_lapic_msr_read(struct kvm_lapic *apic, u32 reg, u64 *data)
static int kvm_lapic_msr_write(struct kvm_lapic *apic, u32 reg, u64 data)
{
/*
- * ICR is a 64-bit register in x2APIC mode (and Hyper'v PV vAPIC) and
+ * ICR is a 64-bit register in x2APIC mode (and Hyper-V PV vAPIC) and
* can be written as such, all other registers remain accessible only
* through 32-bit reads/writes.
*/
if (reg == APIC_ICR)
return kvm_x2apic_icr_write(apic, data);
+ /* Bits 63:32 are reserved in all other registers. */
+ if (data >> 32)
+ return 1;
+
return kvm_lapic_reg_write(apic, reg, (u32)data);
}
Commit 52f04f10b900 ("thermal: intel: int340x: processor_thermal: Fix
deadlock") addressed deadlock issue during user space trip update. But it
missed a case when thermal zone device is disabled when user writes 0.
Call to thermal_zone_device_disable() also causes deadlock as it also
tries to lock tz->lock, which is already claimed by trip_point_temp_store()
in the thermal core code.
Remove call to thermal_zone_device_disable() in the function
sys_set_trip_temp(), which is called from trip_point_temp_store().
Fixes: 52f04f10b900 ("thermal: intel: int340x: processor_thermal: Fix deadlock")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Cc: stable(a)vger.kernel.org # 6.2+
---
.../thermal/intel/int340x_thermal/processor_thermal_device_pci.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
index 90526f46c9b1..d71ee50e7878 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c
@@ -153,7 +153,6 @@ static int sys_set_trip_temp(struct thermal_zone_device *tzd, int trip, int temp
cancel_delayed_work_sync(&pci_info->work);
proc_thermal_mmio_write(pci_info, PROC_THERMAL_MMIO_INT_ENABLE_0, 0);
proc_thermal_mmio_write(pci_info, PROC_THERMAL_MMIO_THRES_0, 0);
- thermal_zone_device_disable(tzd);
pci_info->stored_thres = 0;
return 0;
}
--
2.39.1
commit 727209376f4998bc84db1d5d8af15afea846a92b upstream.
Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
changed the way the split lock detector works when in "warn" mode;
basically, it not only shows the warn message, but also intentionally
introduces a slowdown through sleeping plus serialization mechanism
on such task. Based on discussions in [0], seems the warning alone
wasn't enough motivation for userspace developers to fix their
applications.
This slowdown is enough to totally break some proprietary (aka.
unfixable) userspace[1].
Happens that originally the proposal in [0] was to add a new mode
which would warns + slowdown the "split locking" task, keeping the
old warn mode untouched. In the end, that idea was discarded and
the regular/default "warn" mode now slows down the applications. This
is quite aggressive with regards proprietary/legacy programs that
basically are unable to properly run in kernel with this change.
While it is understandable that a malicious application could DoS
by split locking, it seems unacceptable to regress old/proprietary
userspace programs through a default configuration that previously
worked. An example of such breakage was reported in [1].
Add a sysctl to allow controlling the "misery mode" behavior, as per
Thomas suggestion on [2]. This way, users running legacy and/or
proprietary software are allowed to still execute them with a decent
performance while still observing the warning messages on kernel log.
[0] https://lore.kernel.org/lkml/20220217012721.9694-1-tony.luck@intel.com/
[1] https://github.com/doitsujin/dxvk/issues/2938
[2] https://lore.kernel.org/lkml/87pmf4bter.ffs@tglx/
[ dhansen: minor changelog tweaks, including clarifying the actual
problem ]
Fixes: b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
Suggested-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Tony Luck <tony.luck(a)intel.com>
Tested-by: Andre Almeida <andrealmeid(a)igalia.com>
Link: https://lore.kernel.org/all/20221024200254.635256-1-gpiccoli%40igalia.com
---
Hi folks, I've build tested this on both 6.0.13 and 6.1, worked fine. The
split lock detector code changed almost nothing since 6.0, so that makes
sense...
I think this is important to have in stable, some gaming community members
seems excited with that, it'll help with general proprietary software
(that is basically unfixable), making them run smoothly on 6.0.y and 6.1.y.
I've CCed some folks more than just the stable list, to gather more
opinions on that, so apologies if you received this email but think
that you shouldn't have.
Thanks in advance,
Guilherme
Documentation/admin-guide/sysctl/kernel.rst | 23 ++++++++
arch/x86/kernel/cpu/intel.c | 63 +++++++++++++++++----
2 files changed, 76 insertions(+), 10 deletions(-)
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 98d1b198b2b4..c2c64c1b706f 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1314,6 +1314,29 @@ watchdog work to be queued by the watchdog timer function, otherwise the NMI
watchdog — if enabled — can detect a hard lockup condition.
+split_lock_mitigate (x86 only)
+==============================
+
+On x86, each "split lock" imposes a system-wide performance penalty. On larger
+systems, large numbers of split locks from unprivileged users can result in
+denials of service to well-behaved and potentially more important users.
+
+The kernel mitigates these bad users by detecting split locks and imposing
+penalties: forcing them to wait and only allowing one core to execute split
+locks at a time.
+
+These mitigations can make those bad applications unbearably slow. Setting
+split_lock_mitigate=0 may restore some application performance, but will also
+increase system exposure to denial of service attacks from split lock users.
+
+= ===================================================================
+0 Disable the mitigation mode - just warns the split lock on kernel log
+ and exposes the system to denials of service from the split lockers.
+1 Enable the mitigation mode (this is the default) - penalizes the split
+ lockers with intentional performance degradation.
+= ===================================================================
+
+
stack_erasing
=============
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 2d7ea5480ec3..427899650483 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1034,8 +1034,32 @@ static const struct {
static struct ratelimit_state bld_ratelimit;
+static unsigned int sysctl_sld_mitigate = 1;
static DEFINE_SEMAPHORE(buslock_sem);
+#ifdef CONFIG_PROC_SYSCTL
+static struct ctl_table sld_sysctls[] = {
+ {
+ .procname = "split_lock_mitigate",
+ .data = &sysctl_sld_mitigate,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_douintvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+ {}
+};
+
+static int __init sld_mitigate_sysctl_init(void)
+{
+ register_sysctl_init("kernel", sld_sysctls);
+ return 0;
+}
+
+late_initcall(sld_mitigate_sysctl_init);
+#endif
+
static inline bool match_option(const char *arg, int arglen, const char *opt)
{
int len = strlen(opt), ratelimit;
@@ -1146,12 +1170,20 @@ static void split_lock_init(void)
split_lock_verify_msr(sld_state != sld_off);
}
-static void __split_lock_reenable(struct work_struct *work)
+static void __split_lock_reenable_unlock(struct work_struct *work)
{
sld_update_msr(true);
up(&buslock_sem);
}
+static DECLARE_DELAYED_WORK(sl_reenable_unlock, __split_lock_reenable_unlock);
+
+static void __split_lock_reenable(struct work_struct *work)
+{
+ sld_update_msr(true);
+}
+static DECLARE_DELAYED_WORK(sl_reenable, __split_lock_reenable);
+
/*
* If a CPU goes offline with pending delayed work to re-enable split lock
* detection then the delayed work will be executed on some other CPU. That
@@ -1169,10 +1201,9 @@ static int splitlock_cpu_offline(unsigned int cpu)
return 0;
}
-static DECLARE_DELAYED_WORK(split_lock_reenable, __split_lock_reenable);
-
static void split_lock_warn(unsigned long ip)
{
+ struct delayed_work *work;
int cpu;
if (!current->reported_split_lock)
@@ -1180,14 +1211,26 @@ static void split_lock_warn(unsigned long ip)
current->comm, current->pid, ip);
current->reported_split_lock = 1;
- /* misery factor #1, sleep 10ms before trying to execute split lock */
- if (msleep_interruptible(10) > 0)
- return;
- /* Misery factor #2, only allow one buslocked disabled core at a time */
- if (down_interruptible(&buslock_sem) == -EINTR)
- return;
+ if (sysctl_sld_mitigate) {
+ /*
+ * misery factor #1:
+ * sleep 10ms before trying to execute split lock.
+ */
+ if (msleep_interruptible(10) > 0)
+ return;
+ /*
+ * Misery factor #2:
+ * only allow one buslocked disabled core at a time.
+ */
+ if (down_interruptible(&buslock_sem) == -EINTR)
+ return;
+ work = &sl_reenable_unlock;
+ } else {
+ work = &sl_reenable;
+ }
+
cpu = get_cpu();
- schedule_delayed_work_on(cpu, &split_lock_reenable, 2);
+ schedule_delayed_work_on(cpu, work, 2);
/* Disable split lock detection on this CPU to make progress */
sld_update_msr(false);
--
2.38.1
From: Guennadi Liakhovetski <guennadi.liakhovetski(a)linux.intel.com>
If an IPC4 topology contains an unsupported widget, its .module_info
field won't be set, then sof_ipc4_route_setup() will cause a kernel
Oops trying to dereference it. Add a check for such cases.
Cc: stable(a)vger.kernel.org # 6.2
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski(a)linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
---
Hi Mark,
This patch is generated on top of 6.3-rc4, it will have conflict with asoc-next
because we have ChainDMA scheduled for 6.4 in there.
I should have taken this patch a faster track, but missed it when arranging the
patches, features.
We noticed this when trying to use our development IPC4 topologies with mainline
which does not yet able to handle the process module types (slated fro 6.4).
IPC4 is still evolving so it is not rare that fw/tplg/kernel needs to be
lock-stepped, but NULL pointer dereference should not happen.
This is how the merge conflict resolution should end up between 6.3 and 6.4:
int ret;
/* no route set up if chain DMA is used */
if (src_pipeline->use_chain_dma || sink_pipeline->use_chain_dma) {
if (!src_pipeline->use_chain_dma || !sink_pipeline->use_chain_dma) {
dev_err(sdev->dev,
"use_chain_dma must be set for both src %s and sink %s pipelines\n",
src_widget->widget->name, sink_widget->widget->name);
return -EINVAL;
}
return 0;
}
if (!src_fw_module || !sink_fw_module) {
/* The NULL module will print as "(efault)" */
dev_err(sdev->dev, "source %s or sink %s widget weren't set up properly\n",
src_fw_module->man4_module_entry.name,
sink_fw_module->man4_module_entry.name);
return -ENODEV;
}
sroute->src_queue_id = sof_ipc4_get_queue_id(src_widget, sink_widget,
SOF_PIN_TYPE_SOURCE);
Can you send this patch for 6.3 cycle?
Thank you,
Peter
sound/soc/sof/ipc4-topology.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/sound/soc/sof/ipc4-topology.c b/sound/soc/sof/ipc4-topology.c
index a623707c8ffc..669b99a4f76e 100644
--- a/sound/soc/sof/ipc4-topology.c
+++ b/sound/soc/sof/ipc4-topology.c
@@ -1805,6 +1805,14 @@ static int sof_ipc4_route_setup(struct snd_sof_dev *sdev, struct snd_sof_route *
u32 header, extension;
int ret;
+ if (!src_fw_module || !sink_fw_module) {
+ /* The NULL module will print as "(efault)" */
+ dev_err(sdev->dev, "source %s or sink %s widget weren't set up properly\n",
+ src_fw_module->man4_module_entry.name,
+ sink_fw_module->man4_module_entry.name);
+ return -ENODEV;
+ }
+
sroute->src_queue_id = sof_ipc4_get_queue_id(src_widget, sink_widget,
SOF_PIN_TYPE_SOURCE);
if (sroute->src_queue_id < 0) {
--
2.40.0
We got a WARNING in ext4_add_complete_io:
==================================================================
WARNING: at fs/ext4/page-io.c:231 ext4_put_io_end_defer+0x182/0x250
CPU: 10 PID: 77 Comm: ksoftirqd/10 Tainted: 6.3.0-rc2 #85
RIP: 0010:ext4_put_io_end_defer+0x182/0x250 [ext4]
[...]
Call Trace:
<TASK>
ext4_end_bio+0xa8/0x240 [ext4]
bio_endio+0x195/0x310
blk_update_request+0x184/0x770
scsi_end_request+0x2f/0x240
scsi_io_completion+0x75/0x450
scsi_finish_command+0xef/0x160
scsi_complete+0xa3/0x180
blk_complete_reqs+0x60/0x80
blk_done_softirq+0x25/0x40
__do_softirq+0x119/0x4c8
run_ksoftirqd+0x42/0x70
smpboot_thread_fn+0x136/0x3c0
kthread+0x140/0x1a0
ret_from_fork+0x2c/0x50
==================================================================
Above issue may happen as follows:
cpu1 cpu2
----------------------------|----------------------------
mount -o dioread_lock
ext4_writepages
ext4_do_writepages
*if (ext4_should_dioread_nolock(inode))*
// rsv_blocks is not assigned here
mount -o remount,dioread_nolock
ext4_journal_start_with_reserve
__ext4_journal_start
__ext4_journal_start_sb
jbd2__journal_start
*if (rsv_blocks)*
// h_rsv_handle is not initialized here
mpage_map_and_submit_extent
mpage_map_one_extent
dioread_nolock = ext4_should_dioread_nolock(inode)
if (dioread_nolock && (map->m_flags & EXT4_MAP_UNWRITTEN))
mpd->io_submit.io_end->handle = handle->h_rsv_handle
ext4_set_io_unwritten_flag
io_end->flag |= EXT4_IO_END_UNWRITTEN
// now io_end->handle is NULL but has EXT4_IO_END_UNWRITTEN flag
scsi_finish_command
scsi_io_completion
scsi_io_completion_action
scsi_end_request
blk_update_request
req_bio_endio
bio_endio
bio->bi_end_io > ext4_end_bio
ext4_put_io_end_defer
ext4_add_complete_io
// trigger WARN_ON(!io_end->handle && sbi->s_journal);
The immediate cause of this problem is that ext4_should_dioread_nolock()
function returns inconsistent values in the ext4_do_writepages() and
mpage_map_one_extent(). There are four conditions in this function that
can be changed at mount time to cause this problem. These four conditions
can be divided into two categories:
(1) journal_data and EXT4_EXTENTS_FL, which can be changed by ioctl
(2) DELALLOC and DIOREAD_NOLOCK, which can be changed by remount
The two in the first category have been fixed by commit c8585c6fcaf2
("ext4: fix races between changing inode journal mode and ext4_writepages")
and commit cb85f4d23f79 ("ext4: fix race between writepages and enabling
EXT4_EXTENTS_FL") respectively.
Two cases in the other category have not yet been fixed, and the above
issue is caused by this situation. We refer to the fix for the first
category, when applying options during remount, we grab s_writepages_rwsem
to avoid racing with writepages ops to trigger this problem.
Fixes: 6b523df4fb5a ("ext4: use transaction reservation for extent conversion in ext4_end_io")
Cc: stable(a)vger.kernel.org
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
---
V1->V2:
Grab s_writepages_rwsem unconditionally during remount.
Remove patches 1,2 that are no longer needed.
V2->V3:
Also grab s_writepages_rwsem when restoring options.
fs/ext4/ext4.h | 3 ++-
fs/ext4/super.c | 12 ++++++++++++
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9b2cfc32cf78..5f5ee0c20673 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1703,7 +1703,8 @@ struct ext4_sb_info {
/*
* Barrier between writepages ops and changing any inode's JOURNAL_DATA
- * or EXTENTS flag.
+ * or EXTENTS flag or between writepages ops and changing DELALLOC or
+ * DIOREAD_NOLOCK mount options on remount.
*/
struct percpu_rw_semaphore s_writepages_rwsem;
struct dax_device *s_daxdev;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e6d84c1e34a4..8396da483c17 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -6403,7 +6403,16 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
}
+ /*
+ * Changing the DIOREAD_NOLOCK or DELALLOC mount options may cause
+ * two calls to ext4_should_dioread_nolock() to return inconsistent
+ * values, triggering WARN_ON in ext4_add_complete_io(). we grab
+ * here s_writepages_rwsem to avoid race between writepages ops and
+ * remount.
+ */
+ percpu_down_write(&sbi->s_writepages_rwsem);
ext4_apply_options(fc, sb);
+ percpu_up_write(&sbi->s_writepages_rwsem);
if ((old_opts.s_mount_opt & EXT4_MOUNT_JOURNAL_CHECKSUM) ^
test_opt(sb, JOURNAL_CHECKSUM)) {
@@ -6614,6 +6623,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
return 0;
restore_opts:
+ percpu_down_write(&sbi->s_writepages_rwsem);
sb->s_flags = old_sb_flags;
sbi->s_mount_opt = old_opts.s_mount_opt;
sbi->s_mount_opt2 = old_opts.s_mount_opt2;
@@ -6622,6 +6632,8 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
sbi->s_commit_interval = old_opts.s_commit_interval;
sbi->s_min_batch_time = old_opts.s_min_batch_time;
sbi->s_max_batch_time = old_opts.s_max_batch_time;
+ percpu_up_write(&sbi->s_writepages_rwsem);
+
if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks)
ext4_release_system_zone(sb);
#ifdef CONFIG_QUOTA
--
2.31.1
The patch below does not apply to the 6.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.2.y
git checkout FETCH_HEAD
git cherry-pick -x 88b170088ad2c3e27086fe35769aa49f8a512564
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '1680003625145213(a)kroah.com' --subject-prefix 'PATCH 6.2.y' HEAD^..
Possible dependencies:
88b170088ad2 ("zonefs: Fix error message in zonefs_file_dio_append()")
aa7f243f32e1 ("zonefs: Separate zone information from inode information")
34422914dc00 ("zonefs: Reduce struct zonefs_inode_info size")
46a9c526eef7 ("zonefs: Simplify IO error handling")
4008e2a0b01a ("zonefs: Reorganize code")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 88b170088ad2c3e27086fe35769aa49f8a512564 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Date: Mon, 20 Mar 2023 22:49:15 +0900
Subject: [PATCH] zonefs: Fix error message in zonefs_file_dio_append()
Since the expected write location in a sequential file is always at the
end of the file (append write), when an invalid write append location is
detected in zonefs_file_dio_append(), print the invalid written location
instead of the expected write location.
Fixes: a608da3bd730 ("zonefs: Detect append writes at invalid locations")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index a545a6d9a32e..617e4f9db42e 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -426,7 +426,7 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
if (bio->bi_iter.bi_sector != wpsector) {
zonefs_warn(inode->i_sb,
"Corrupted write pointer %llu for zone at %llu\n",
- wpsector, z->z_sector);
+ bio->bi_iter.bi_sector, z->z_sector);
ret = -EIO;
}
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 88b170088ad2c3e27086fe35769aa49f8a512564
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '1680003630102245(a)kroah.com' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
88b170088ad2 ("zonefs: Fix error message in zonefs_file_dio_append()")
aa7f243f32e1 ("zonefs: Separate zone information from inode information")
34422914dc00 ("zonefs: Reduce struct zonefs_inode_info size")
46a9c526eef7 ("zonefs: Simplify IO error handling")
4008e2a0b01a ("zonefs: Reorganize code")
a608da3bd730 ("zonefs: Detect append writes at invalid locations")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 88b170088ad2c3e27086fe35769aa49f8a512564 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Date: Mon, 20 Mar 2023 22:49:15 +0900
Subject: [PATCH] zonefs: Fix error message in zonefs_file_dio_append()
Since the expected write location in a sequential file is always at the
end of the file (append write), when an invalid write append location is
detected in zonefs_file_dio_append(), print the invalid written location
instead of the expected write location.
Fixes: a608da3bd730 ("zonefs: Detect append writes at invalid locations")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index a545a6d9a32e..617e4f9db42e 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -426,7 +426,7 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
if (bio->bi_iter.bi_sector != wpsector) {
zonefs_warn(inode->i_sb,
"Corrupted write pointer %llu for zone at %llu\n",
- wpsector, z->z_sector);
+ bio->bi_iter.bi_sector, z->z_sector);
ret = -EIO;
}
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 88b170088ad2c3e27086fe35769aa49f8a512564
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '1680003631103149(a)kroah.com' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
88b170088ad2 ("zonefs: Fix error message in zonefs_file_dio_append()")
aa7f243f32e1 ("zonefs: Separate zone information from inode information")
34422914dc00 ("zonefs: Reduce struct zonefs_inode_info size")
46a9c526eef7 ("zonefs: Simplify IO error handling")
4008e2a0b01a ("zonefs: Reorganize code")
a608da3bd730 ("zonefs: Detect append writes at invalid locations")
db58653ce0c7 ("zonefs: Fix active zone accounting")
7dd12d65ac64 ("zonefs: fix zone report size in __zonefs_io_error()")
8745889a7fd0 ("Merge tag 'iomap-6.0-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 88b170088ad2c3e27086fe35769aa49f8a512564 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Date: Mon, 20 Mar 2023 22:49:15 +0900
Subject: [PATCH] zonefs: Fix error message in zonefs_file_dio_append()
Since the expected write location in a sequential file is always at the
end of the file (append write), when an invalid write append location is
detected in zonefs_file_dio_append(), print the invalid written location
instead of the expected write location.
Fixes: a608da3bd730 ("zonefs: Detect append writes at invalid locations")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index a545a6d9a32e..617e4f9db42e 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -426,7 +426,7 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
if (bio->bi_iter.bi_sector != wpsector) {
zonefs_warn(inode->i_sb,
"Corrupted write pointer %llu for zone at %llu\n",
- wpsector, z->z_sector);
+ bio->bi_iter.bi_sector, z->z_sector);
ret = -EIO;
}
}
SCI IP on RZ/G2L alike SoCs do not need regshift compared to other SCI
IPs on the SH platform. Currently, it does regshift and configuring Rx
wrongly. Drop adding regshift for RZ/G2L alike SoCs.
Fixes: dfc80387aefb ("serial: sh-sci: Compute the regshift value for SCI ports")
Cc: stable(a)vger.kernel.org
Signed-off-by: Biju Das <biju.das.jz(a)bp.renesas.com>
---
v3->v4:
* Updated the fixes tag
* Replaced sci_port->is_rz_sci with dev->dev.of_node as regshift are only needed
for sh770x/sh7750/sh7760, which don't use DT yet.
* Dropped is_rz_sci variable from struct sci_port.
v3:
* New patch.
---
drivers/tty/serial/sh-sci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index 616041faab55..15954ca3e9dc 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -2937,7 +2937,7 @@ static int sci_init_single(struct platform_device *dev,
port->flags = UPF_FIXED_PORT | UPF_BOOT_AUTOCONF | p->flags;
port->fifosize = sci_port->params->fifosize;
- if (port->type == PORT_SCI) {
+ if (port->type == PORT_SCI && !dev->dev.of_node) {
if (sci_port->reg_size >= 0x20)
port->regshift = 2;
else
--
2.25.1
How are you
I want to inform you that I have succeeded in transferring
the huge amount of funds under the cooperation of the new
partner from London and I have written a Bank Draft of $1.9M for
you.
Have you received it? In-case you have not, Contact Mr.
David Lawrence And Ask him for the Bank draft which I kept
for Compensation okay. His email address
(davidllawrence@consultant.com)Phone+:+1(945)212-0126
Mrs. Ester Nelson Philipsxxxxx
Currently, with VHE, KVM enables the EL0 event counting for the
guest on vcpu_load() or KVM enables it as a part of the PMU
register emulation process, when needed. However, in the migration
case (with VHE), the same handling is lacking. So, enable it on the
first KVM_RUN with VHE (after the migration) when needed.
Fixes: d0c94c49792c ("KVM: arm64: Restore PMU configuration on first run")
Cc: stable(a)vger.kernel.org
Signed-off-by: Reiji Watanabe <reijiw(a)google.com>
---
arch/arm64/kvm/pmu-emul.c | 1 +
arch/arm64/kvm/sys_regs.c | 1 -
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index c243b10f3e15..5eca0cdd961d 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -558,6 +558,7 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
for_each_set_bit(i, &mask, 32)
kvm_pmu_set_pmc_value(kvm_vcpu_idx_to_pmc(vcpu, i), 0, true);
}
+ kvm_vcpu_pmu_restore_guest(vcpu);
}
static bool kvm_pmu_counter_is_enabled(struct kvm_pmc *pmc)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1b2c161120be..34688918c811 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -794,7 +794,6 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
if (!kvm_supports_32bit_el0())
val |= ARMV8_PMU_PMCR_LC;
kvm_pmu_handle_pmcr(vcpu, val);
- kvm_vcpu_pmu_restore_guest(vcpu);
} else {
/* PMCR.P & PMCR.C are RAZ */
val = __vcpu_sys_reg(vcpu, PMCR_EL0)
--
2.40.0.348.gf938b09366-goog
The quilt patch titled
Subject: mm: kfence: fix handling discontiguous page
has been removed from the -mm tree. Its filename was
mm-kfence-fix-handling-discontiguous-page.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: kfence: fix handling discontiguous page
Date: Thu, 23 Mar 2023 10:50:03 +0800
The struct pages could be discontiguous when the kfence pool is allocated
via alloc_contig_pages() with CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP.
This may result in setting PG_slab and memcg_data to a arbitrary
address (may be not used as a struct page), which in the worst case
might corrupt the kernel.
So the iteration should use nth_page().
Link: https://lkml.kernel.org/r/20230323025003.94447-1-songmuchun@bytedance.com
Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Marco Elver <elver(a)google.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: SeongJae Park <sjpark(a)amazon.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kfence/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/kfence/core.c~mm-kfence-fix-handling-discontiguous-page
+++ a/mm/kfence/core.c
@@ -556,7 +556,7 @@ static unsigned long kfence_init_pool(vo
* enters __slab_free() slow-path.
*/
for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
- struct slab *slab = page_slab(&pages[i]);
+ struct slab *slab = page_slab(nth_page(pages, i));
if (!i || (i % 2))
continue;
@@ -602,7 +602,7 @@ static unsigned long kfence_init_pool(vo
reset_slab:
for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
- struct slab *slab = page_slab(&pages[i]);
+ struct slab *slab = page_slab(nth_page(pages, i));
if (!i || (i % 2))
continue;
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
mm-hugetlb_vmemmap-simplify-hugetlb_vmemmap_init-a-bit.patch
The quilt patch titled
Subject: mm: kfence: fix PG_slab and memcg_data clearing
has been removed from the -mm tree. Its filename was
mm-kfence-fix-pg_slab-and-memcg_data-clearing.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: kfence: fix PG_slab and memcg_data clearing
Date: Mon, 20 Mar 2023 11:00:59 +0800
It does not reset PG_slab and memcg_data when KFENCE fails to initialize
kfence pool at runtime. It is reporting a "Bad page state" message when
kfence pool is freed to buddy. The checking of whether it is a compound
head page seems unnecessary since we already guarantee this when
allocating kfence pool. Remove the check to simplify the code.
Link: https://lkml.kernel.org/r/20230320030059.20189-1-songmuchun@bytedance.com
Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: SeongJae Park <sjpark(a)amazon.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kfence/core.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
--- a/mm/kfence/core.c~mm-kfence-fix-pg_slab-and-memcg_data-clearing
+++ a/mm/kfence/core.c
@@ -561,10 +561,6 @@ static unsigned long kfence_init_pool(vo
if (!i || (i % 2))
continue;
- /* Verify we do not have a compound head page. */
- if (WARN_ON(compound_head(&pages[i]) != &pages[i]))
- return addr;
-
__folio_set_slab(slab_folio(slab));
#ifdef CONFIG_MEMCG
slab->memcg_data = (unsigned long)&kfence_metadata[i / 2 - 1].objcg |
@@ -597,12 +593,26 @@ static unsigned long kfence_init_pool(vo
/* Protect the right redzone. */
if (unlikely(!kfence_protect(addr + PAGE_SIZE)))
- return addr;
+ goto reset_slab;
addr += 2 * PAGE_SIZE;
}
return 0;
+
+reset_slab:
+ for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) {
+ struct slab *slab = page_slab(&pages[i]);
+
+ if (!i || (i % 2))
+ continue;
+#ifdef CONFIG_MEMCG
+ slab->memcg_data = 0;
+#endif
+ __folio_clear_slab(slab_folio(slab));
+ }
+
+ return addr;
}
static bool __init kfence_init_pool_early(void)
@@ -632,16 +642,6 @@ static bool __init kfence_init_pool_earl
* fails for the first page, and therefore expect addr==__kfence_pool in
* most failure cases.
*/
- for (char *p = (char *)addr; p < __kfence_pool + KFENCE_POOL_SIZE; p += PAGE_SIZE) {
- struct slab *slab = virt_to_slab(p);
-
- if (!slab)
- continue;
-#ifdef CONFIG_MEMCG
- slab->memcg_data = 0;
-#endif
- __folio_clear_slab(slab_folio(slab));
- }
memblock_free_late(__pa(addr), KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool));
__kfence_pool = NULL;
return false;
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
mm-hugetlb_vmemmap-simplify-hugetlb_vmemmap_init-a-bit.patch
The quilt patch titled
Subject: fsdax: dedupe should compare the min of two iters' length
has been removed from the -mm tree. Its filename was
fsdax-dedupe-should-compare-the-min-of-two-iters-length.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Shiyang Ruan <ruansy.fnst(a)fujitsu.com>
Subject: fsdax: dedupe should compare the min of two iters' length
Date: Wed, 22 Mar 2023 07:25:58 +0000
In an dedupe comparison iter loop, the length of iomap_iter decreases
because it implies the remaining length after each iteration.
The dedupe command will fail with -EIO if the range is larger than one
page size and not aligned to the page size. Also report warning in dmesg:
[ 4338.498374] ------------[ cut here ]------------
[ 4338.498689] WARNING: CPU: 3 PID: 1415645 at fs/iomap/iter.c:16
...
The compare function should use the min length of the current iters,
not the total length.
Link: https://lkml.kernel.org/r/1679469958-2-1-git-send-email-ruansy.fnst@fujitsu…
Fixes: 0e79e3736d54 ("fsdax: dedupe: iter two files at the same time")
Signed-off-by: Shiyang Ruan <ruansy.fnst(a)fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/dax.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/dax.c~fsdax-dedupe-should-compare-the-min-of-two-iters-length
+++ a/fs/dax.c
@@ -2027,8 +2027,8 @@ int dax_dedupe_file_range_compare(struct
while ((ret = iomap_iter(&src_iter, ops)) > 0 &&
(ret = iomap_iter(&dst_iter, ops)) > 0) {
- compared = dax_range_compare_iter(&src_iter, &dst_iter, len,
- same);
+ compared = dax_range_compare_iter(&src_iter, &dst_iter,
+ min(src_iter.len, dst_iter.len), same);
if (compared < 0)
return ret;
src_iter.processed = dst_iter.processed = compared;
_
Patches currently in -mm which might be from ruansy.fnst(a)fujitsu.com are
fsdax-force-clear-dirty-mark-if-cow.patch
The quilt patch titled
Subject: fsdax: unshare: zero destination if srcmap is HOLE or UNWRITTEN
has been removed from the -mm tree. Its filename was
fsdax-unshare-zero-destination-if-srcmap-is-hole-or-unwritten.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Shiyang Ruan <ruansy.fnst(a)fujitsu.com>
Subject: fsdax: unshare: zero destination if srcmap is HOLE or UNWRITTEN
Date: Wed, 22 Mar 2023 11:11:09 +0000
unshare copies data from source to destination. But if the source is
HOLE or UNWRITTEN extents, we should zero the destination, otherwise
the HOLE or UNWRITTEN part will be user-visible old data of the new
allocated extent.
Found by running generic/649 while mounting with -o dax=always on pmem.
Link: https://lkml.kernel.org/r/1679483469-2-1-git-send-email-ruansy.fnst@fujitsu…
Fixes: d984648e428b ("fsdax,xfs: port unshare to fsdax")
Signed-off-by: Shiyang Ruan <ruansy.fnst(a)fujitsu.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Darrick J. Wong <djwong(a)kernel.org>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/dax.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
--- a/fs/dax.c~fsdax-unshare-zero-destination-if-srcmap-is-hole-or-unwritten
+++ a/fs/dax.c
@@ -1258,15 +1258,20 @@ static s64 dax_unshare_iter(struct iomap
/* don't bother with blocks that are not shared to start with */
if (!(iomap->flags & IOMAP_F_SHARED))
return length;
- /* don't bother with holes or unwritten extents */
- if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
- return length;
id = dax_read_lock();
ret = dax_iomap_direct_access(iomap, pos, length, &daddr, NULL);
if (ret < 0)
goto out_unlock;
+ /* zero the distance if srcmap is HOLE or UNWRITTEN */
+ if (srcmap->flags & IOMAP_F_SHARED || srcmap->type == IOMAP_UNWRITTEN) {
+ memset(daddr, 0, length);
+ dax_flush(iomap->dax_dev, daddr, length);
+ ret = length;
+ goto out_unlock;
+ }
+
ret = dax_iomap_direct_access(srcmap, pos, length, &saddr, NULL);
if (ret < 0)
goto out_unlock;
_
Patches currently in -mm which might be from ruansy.fnst(a)fujitsu.com are
fsdax-force-clear-dirty-mark-if-cow.patch
On Tue, Mar 28, 2023 at 05:36:27PM +0800, Min Li wrote:
> Userspace can guess the id value and try to race oa_config object creation
> with config remove, resulting in a use-after-free if we dereference the
> object after unlocking the metrics_lock. For that reason, unlocking the
> metrics_lock must be done after we are done dereferencing the object.
>
> Signed-off-by: Min Li <lm0963hack(a)gmail.com>
I think we should also add
Fixes: f89823c21224 ("drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface")
Cc: <stable(a)vger.kernel.org> # v4.14+
Andi
> ---
> drivers/gpu/drm/i915/i915_perf.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 824a34ec0b83..93748ca2c5da 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -4634,13 +4634,13 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
> err = oa_config->id;
> goto sysfs_err;
> }
> -
> - mutex_unlock(&perf->metrics_lock);
> + id = oa_config->id;
>
> drm_dbg(&perf->i915->drm,
> "Added config %s id=%i\n", oa_config->uuid, oa_config->id);
> + mutex_unlock(&perf->metrics_lock);
>
> - return oa_config->id;
> + return id;
>
> sysfs_err:
> mutex_unlock(&perf->metrics_lock);
> --
> 2.25.1
The patch titled
Subject: mm: take a page reference when removing device exclusive entries
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-take-a-page-reference-when-removing-device-exclusive-entries.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Alistair Popple <apopple(a)nvidia.com>
Subject: mm: take a page reference when removing device exclusive entries
Date: Tue, 28 Mar 2023 13:14:34 +1100
Device exclusive page table entries are used to prevent CPU access to a
page whilst it is being accessed from a device. Typically this is used to
implement atomic operations when the underlying bus does not support
atomic access. When a CPU thread encounters a device exclusive entry it
locks the page and restores the original entry after calling mmu notifiers
to signal drivers that exclusive access is no longer available.
The device exclusive entry holds a reference to the page making it safe to
access the struct page whilst the entry is present. However the fault
handling code does not hold the PTL when taking the page lock. This means
if there are multiple threads faulting concurrently on the device
exclusive entry one will remove the entry whilst others will wait on the
page lock without holding a reference.
This can lead to threads locking or waiting on a page with a zero
refcount. Whilst mmap_lock prevents the pages getting freed via munmap()
they may still be freed by a migration. This leads to warnings such as
PAGE_FLAGS_CHECK_AT_FREE due to the page being locked when the refcount
drops to zero. Note that during removal of the device exclusive entry the
PTE is currently re-checked under the PTL so no futher bad page accesses
occur once it is locked.
Link: https://lkml.kernel.org/r/20230328021434.292971-1-apopple@nvidia.com
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Signed-off-by: Alistair Popple <apopple(a)nvidia.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
--- a/mm/memory.c~mm-take-a-page-reference-when-removing-device-exclusive-entries
+++ a/mm/memory.c
@@ -3563,8 +3563,19 @@ static vm_fault_t remove_device_exclusiv
struct vm_area_struct *vma = vmf->vma;
struct mmu_notifier_range range;
- if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags))
+ /*
+ * We need a page reference to lock the page because we don't hold the
+ * PTL so a racing thread can remove the device-exclusive entry and
+ * unmap the page. If the page is free the entry must have been
+ * removed already.
+ */
+ if (!get_page_unless_zero(vmf->page))
+ return 0;
+
+ if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) {
+ put_page(vmf->page);
return VM_FAULT_RETRY;
+ }
mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0,
vma->vm_mm, vmf->address & PAGE_MASK,
(vmf->address & PAGE_MASK) + PAGE_SIZE, NULL);
@@ -3577,6 +3588,7 @@ static vm_fault_t remove_device_exclusiv
pte_unmap_unlock(vmf->pte, vmf->ptl);
folio_unlock(folio);
+ put_page(vmf->page);
mmu_notifier_invalidate_range_end(&range);
return 0;
_
Patches currently in -mm which might be from apopple(a)nvidia.com are
mm-take-a-page-reference-when-removing-device-exclusive-entries.patch
From: Eric Biggers <ebiggers(a)google.com>
commit a075bacde257f755bea0e53400c9f1cdd1b8e8e6 upstream.
[Please apply to 5.10-stable and 5.4-stable.]
The full pagecache drop at the end of FS_IOC_ENABLE_VERITY is causing
performance problems and is hindering adoption of fsverity. It was
intended to solve a race condition where unverified pages might be left
in the pagecache. But actually it doesn't solve it fully.
Since the incomplete solution for this race condition has too much
performance impact for it to be worth it, let's remove it for now.
Fixes: 3fda4c617e84 ("fs-verity: implement FS_IOC_ENABLE_VERITY ioctl")
Cc: stable(a)vger.kernel.org
Reviewed-by: Victor Hsieh <victorhsieh(a)google.com>
Link: https://lore.kernel.org/r/20230314235332.50270-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/verity/enable.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index 734862e608fd3..5ceae66e1ae02 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -391,25 +391,27 @@ int fsverity_ioctl_enable(struct file *filp, const void __user *uarg)
goto out_drop_write;
err = enable_verity(filp, &arg);
- if (err)
- goto out_allow_write_access;
/*
- * Some pages of the file may have been evicted from pagecache after
- * being used in the Merkle tree construction, then read into pagecache
- * again by another process reading from the file concurrently. Since
- * these pages didn't undergo verification against the file measurement
- * which fs-verity now claims to be enforcing, we have to wipe the
- * pagecache to ensure that all future reads are verified.
+ * We no longer drop the inode's pagecache after enabling verity. This
+ * used to be done to try to avoid a race condition where pages could be
+ * evicted after being used in the Merkle tree construction, then
+ * re-instantiated by a concurrent read. Such pages are unverified, and
+ * the backing storage could have filled them with different content, so
+ * they shouldn't be used to fulfill reads once verity is enabled.
+ *
+ * But, dropping the pagecache has a big performance impact, and it
+ * doesn't fully solve the race condition anyway. So for those reasons,
+ * and also because this race condition isn't very important relatively
+ * speaking (especially for small-ish files, where the chance of a page
+ * being used, evicted, *and* re-instantiated all while enabling verity
+ * is quite small), we no longer drop the inode's pagecache.
*/
- filemap_write_and_wait(inode->i_mapping);
- invalidate_inode_pages2(inode->i_mapping);
/*
* allow_write_access() is needed to pair with deny_write_access().
* Regardless, the filesystem won't allow writing to verity files.
*/
-out_allow_write_access:
allow_write_access(filp);
out_drop_write:
mnt_drop_write_file(filp);
--
2.40.0
From: Eric Biggers <ebiggers(a)google.com>
commit a075bacde257f755bea0e53400c9f1cdd1b8e8e6 upstream.
[Please apply to 6.1-stable and 5.15-stable.]
The full pagecache drop at the end of FS_IOC_ENABLE_VERITY is causing
performance problems and is hindering adoption of fsverity. It was
intended to solve a race condition where unverified pages might be left
in the pagecache. But actually it doesn't solve it fully.
Since the incomplete solution for this race condition has too much
performance impact for it to be worth it, let's remove it for now.
Fixes: 3fda4c617e84 ("fs-verity: implement FS_IOC_ENABLE_VERITY ioctl")
Cc: stable(a)vger.kernel.org
Reviewed-by: Victor Hsieh <victorhsieh(a)google.com>
Link: https://lore.kernel.org/r/20230314235332.50270-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/verity/enable.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/fs/verity/enable.c b/fs/verity/enable.c
index df6b499bf6a14..400c264bf8930 100644
--- a/fs/verity/enable.c
+++ b/fs/verity/enable.c
@@ -390,25 +390,27 @@ int fsverity_ioctl_enable(struct file *filp, const void __user *uarg)
goto out_drop_write;
err = enable_verity(filp, &arg);
- if (err)
- goto out_allow_write_access;
/*
- * Some pages of the file may have been evicted from pagecache after
- * being used in the Merkle tree construction, then read into pagecache
- * again by another process reading from the file concurrently. Since
- * these pages didn't undergo verification against the file digest which
- * fs-verity now claims to be enforcing, we have to wipe the pagecache
- * to ensure that all future reads are verified.
+ * We no longer drop the inode's pagecache after enabling verity. This
+ * used to be done to try to avoid a race condition where pages could be
+ * evicted after being used in the Merkle tree construction, then
+ * re-instantiated by a concurrent read. Such pages are unverified, and
+ * the backing storage could have filled them with different content, so
+ * they shouldn't be used to fulfill reads once verity is enabled.
+ *
+ * But, dropping the pagecache has a big performance impact, and it
+ * doesn't fully solve the race condition anyway. So for those reasons,
+ * and also because this race condition isn't very important relatively
+ * speaking (especially for small-ish files, where the chance of a page
+ * being used, evicted, *and* re-instantiated all while enabling verity
+ * is quite small), we no longer drop the inode's pagecache.
*/
- filemap_write_and_wait(inode->i_mapping);
- invalidate_inode_pages2(inode->i_mapping);
/*
* allow_write_access() is needed to pair with deny_write_access().
* Regardless, the filesystem won't allow writing to verity files.
*/
-out_allow_write_access:
allow_write_access(filp);
out_drop_write:
mnt_drop_write_file(filp);
--
2.40.0
The fix for XSA-423 introduced a bug which resulted in loss of network
connection in some configurations.
The first patch is fixing the issue, while the second one is removing
a test which isn't needed. The third patch is making error messages
more uniform.
Changes in V2:
- add patch 3
- comment addressed (patch 1)
Juergen Gross (3):
xen/netback: don't do grant copy across page boundary
xen/netback: remove not needed test in xenvif_tx_build_gops()
xen/netback: use same error messages for same errors
drivers/net/xen-netback/common.h | 2 +-
drivers/net/xen-netback/netback.c | 37 ++++++++++++++++++++++---------
2 files changed, 28 insertions(+), 11 deletions(-)
--
2.35.3
Greg,
Chandan is preparing a series of backports from v5.11 to 5.4.y.
These two backports were selected by Chandan for 5.4.y, but are
currently missing from 5.10.y.
Specifically, patch #2 fixes a problem seen in the wild on UEK
and the UEK kernels already carry this patch.
The patches have gone through the usual xfs test/review routine.
Thanks,
Amir.
Brian Foster (1):
xfs: don't reuse busy extents on extent trim
Darrick J. Wong (1):
xfs: shut down the filesystem if we screw up quota reservation
fs/xfs/xfs_extent_busy.c | 14 --------------
fs/xfs/xfs_trans_dquot.c | 13 ++++++++++---
2 files changed, 10 insertions(+), 17 deletions(-)
--
2.34.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 90410bcf873cf05f54a32183afff0161f44f9715
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '16793134471133(a)kroah.com' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
90410bcf873c ("ocfs2: fix data corruption after failed write")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 90410bcf873cf05f54a32183afff0161f44f9715 Mon Sep 17 00:00:00 2001
From: Jan Kara via Ocfs2-devel <ocfs2-devel(a)oss.oracle.com>
Date: Thu, 2 Mar 2023 16:38:43 +0100
Subject: [PATCH] ocfs2: fix data corruption after failed write
When buffered write fails to copy data into underlying page cache page,
ocfs2_write_end_nolock() just zeroes out and dirties the page. This can
leave dirty page beyond EOF and if page writeback tries to write this page
before write succeeds and expands i_size, page gets into inconsistent
state where page dirty bit is clear but buffer dirty bits stay set
resulting in page data never getting written and so data copied to the
page is lost. Fix the problem by invalidating page beyond EOF after
failed write.
Link: https://lkml.kernel.org/r/20230302153843.18499-1-jack@suse.cz
Fixes: 6dbf7bb55598 ("fs: Don't invalidate page buffers in block_write_full_page()")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 1d65f6ef00ca..0394505fdce3 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1977,11 +1977,26 @@ int ocfs2_write_end_nolock(struct address_space *mapping,
}
if (unlikely(copied < len) && wc->w_target_page) {
+ loff_t new_isize;
+
if (!PageUptodate(wc->w_target_page))
copied = 0;
- ocfs2_zero_new_buffers(wc->w_target_page, start+copied,
- start+len);
+ new_isize = max_t(loff_t, i_size_read(inode), pos + copied);
+ if (new_isize > page_offset(wc->w_target_page))
+ ocfs2_zero_new_buffers(wc->w_target_page, start+copied,
+ start+len);
+ else {
+ /*
+ * When page is fully beyond new isize (data copy
+ * failed), do not bother zeroing the page. Invalidate
+ * it instead so that writeback does not get confused
+ * put page & buffer dirty bits into inconsistent
+ * state.
+ */
+ block_invalidate_folio(page_folio(wc->w_target_page),
+ 0, PAGE_SIZE);
+ }
}
if (wc->w_target_page)
flush_dcache_page(wc->w_target_page);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 1c86a188e03156223a34d09ce290b49bd4dd0403
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '16800049118459(a)kroah.com' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
1c86a188e031 ("mm: kfence: fix using kfence_metadata without initialization in show_object()")
b33f778bba5e ("kfence: alloc kfence_pool after system startup")
698361bca2d5 ("kfence: allow re-enabling KFENCE after system startup")
07e8481d3c38 ("kfence: always use static branches to guard kfence_alloc()")
08f6b10630f2 ("kfence: limit currently covered allocations when pool nearly full")
a9ab52bbcb52 ("kfence: move saving stack trace of allocations into __kfence_alloc()")
9a19aeb56650 ("kfence: count unexpectedly skipped allocations")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1c86a188e03156223a34d09ce290b49bd4dd0403 Mon Sep 17 00:00:00 2001
From: Muchun Song <muchun.song(a)linux.dev>
Date: Wed, 15 Mar 2023 11:44:41 +0800
Subject: [PATCH] mm: kfence: fix using kfence_metadata without initialization
in show_object()
The variable kfence_metadata is initialized in kfence_init_pool(), then,
it is not initialized if kfence is disabled after booting. In this case,
kfence_metadata will be used (e.g. ->lock and ->state fields) without
initialization when reading /sys/kernel/debug/kfence/objects. There will
be a warning if you enable CONFIG_DEBUG_SPINLOCK. Fix it by creating
debugfs files when necessary.
Link: https://lkml.kernel.org/r/20230315034441.44321-1-songmuchun@bytedance.com
Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Tested-by: Marco Elver <elver(a)google.com>
Reviewed-by: Marco Elver <elver(a)google.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: SeongJae Park <sjpark(a)amazon.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 5349c37a5dac..79c94ee55f97 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -726,10 +726,14 @@ static const struct seq_operations objects_sops = {
};
DEFINE_SEQ_ATTRIBUTE(objects);
-static int __init kfence_debugfs_init(void)
+static int kfence_debugfs_init(void)
{
- struct dentry *kfence_dir = debugfs_create_dir("kfence", NULL);
+ struct dentry *kfence_dir;
+ if (!READ_ONCE(kfence_enabled))
+ return 0;
+
+ kfence_dir = debugfs_create_dir("kfence", NULL);
debugfs_create_file("stats", 0444, kfence_dir, NULL, &stats_fops);
debugfs_create_file("objects", 0400, kfence_dir, NULL, &objects_fops);
return 0;
@@ -883,6 +887,8 @@ static int kfence_init_late(void)
}
kfence_init_enable();
+ kfence_debugfs_init();
+
return 0;
}
[Public]
Hi,
For a product that has the IP GC 11.0.4, there is a lone error message that comes up during bootup related to some missing support for KFD on kernel 6.1.20.
kfd kfd: amdgpu: GC IP 0b0004 not supported in kfd
This is fixed by this series of commits that landed in 6.2 that fixes KFD support on this product (and also fixes a warning).
fd72e2cb2f9d ("drm/amdkfd: introduce dummy cache info for property asic")
c0cc999f3c32 ("drm/amdkfd: Fix the warning of array-index-out-of-bounds")
88c21c2b56aa ("drm/amdkfd: add GC 11.0.4 KFD support")
Can you please bring to 6.1.y?
Thanks,
The fix for XSA-423 introduced a bug which resulted in loss of network
connection in some configurations.
The first patch is fixing the issue, while the second one is removing
a test which isn't needed.
Juergen Gross (2):
xen/netback: don't do grant copy across page boundary
xen/netback: remove not needed test in xenvif_tx_build_gops()
drivers/net/xen-netback/common.h | 2 +-
drivers/net/xen-netback/netback.c | 29 +++++++++++++++++++++++------
2 files changed, 24 insertions(+), 7 deletions(-)
--
2.35.3
Make sure to destroy the workqueue also in case of early errors during
bind (e.g. a subcomponent failing to bind).
Since commit c3b790ea07a1 ("drm: Manage drm_mode_config_init with
drmm_") the mode config will be freed when the drm device is released
also when using the legacy interface, but add an explicit cleanup for
consistency and to facilitate backporting.
Fixes: 060530f1ea67 ("drm/msm: use componentised device support")
Cc: stable(a)vger.kernel.org # 3.15
Cc: Rob Clark <robdclark(a)gmail.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/gpu/drm/msm/msm_drv.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index ac3b77dbfacc..73c597565f99 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -458,7 +458,7 @@ static int msm_drm_init(struct device *dev, const struct drm_driver *drv)
ret = msm_init_vram(ddev);
if (ret)
- goto err_put_dev;
+ goto err_cleanup_mode_config;
/* Bind all our sub-components: */
ret = component_bind_all(dev, ddev);
@@ -563,6 +563,9 @@ static int msm_drm_init(struct device *dev, const struct drm_driver *drv)
err_deinit_vram:
msm_deinit_vram(ddev);
+err_cleanup_mode_config:
+ drm_mode_config_cleanup(ddev);
+ destroy_workqueue(priv->wq);
err_put_dev:
drm_dev_put(ddev);
--
2.39.2
在 2023/3/5 12:02, Sasha Levin 写道:
> This is a note to let you know that I've just added the patch titled
>
> sched/fair: sanitize vruntime of entity being placed
>
> to the 4.14-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> sched-fair-sanitize-vruntime-of-entity-being-placed.patch
> and it can be found in the queue-4.14 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 38247e1de3305a6ef644404ac818bc6129440eae
Hi,
This patch has significant impact on the hackbench.throughput [1].
Please don't backport this patch.
[1] https://lore.kernel.org/lkml/202302211553.9738f304-yujie.liu@intel.com/T/#u
Thanks.
Zhang Qiao.
> Author: Zhang Qiao <zhangqiao22(a)huawei.com>
> Date: Mon Jan 30 13:22:16 2023 +0100
>
> sched/fair: sanitize vruntime of entity being placed
>
> [ Upstream commit 829c1651e9c4a6f78398d3e67651cef9bb6b42cc ]
>
> When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
> to the base level (around cfs_rq->min_vruntime), so that the entity
> doesn't gain extra boost when placed backwards.
>
> However, if the entity being placed wasn't executed for a long time, its
> vruntime may get too far behind (e.g. while cfs_rq was executing a
> low-weight hog), which can inverse the vruntime comparison due to s64
> overflow. This results in the entity being placed with its original
> vruntime way forwards, so that it will effectively never get to the cpu.
>
> To prevent that, ignore the vruntime of the entity being placed if it
> didn't execute for much longer than the characteristic sheduler time
> scale.
>
> [rkagan: formatted, adjusted commit log, comments, cutoff value]
> Signed-off-by: Zhang Qiao <zhangqiao22(a)huawei.com>
> Co-developed-by: Roman Kagan <rkagan(a)amazon.de>
> Signed-off-by: Roman Kagan <rkagan(a)amazon.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
> Link: https://lkml.kernel.org/r/20230130122216.3555094-1-rkagan@amazon.de
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3ff60230710c9..afa21e43477fa 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3615,6 +3615,7 @@ static void
> place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> {
> u64 vruntime = cfs_rq->min_vruntime;
> + u64 sleep_time;
>
> /*
> * The 'current' period is already promised to the current tasks,
> @@ -3639,8 +3640,18 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> vruntime -= thresh;
> }
>
> - /* ensure we never gain time by being placed backwards. */
> - se->vruntime = max_vruntime(se->vruntime, vruntime);
> + /*
> + * Pull vruntime of the entity being placed to the base level of
> + * cfs_rq, to prevent boosting it if placed backwards. If the entity
> + * slept for a long time, don't even try to compare its vruntime with
> + * the base as it may be too far off and the comparison may get
> + * inversed due to s64 overflow.
> + */
> + sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> + if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> + se->vruntime = vruntime;
> + else
> + se->vruntime = max_vruntime(se->vruntime, vruntime);
> }
>
> static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
> .
>
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x d9a02e016aaf5a57fb44e9a5e6da8ccd3b9e2e70
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '168000726237133(a)kroah.com' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
d9a02e016aaf ("dm crypt: avoid accessing uninitialized tasklet")
d3703ef33129 ("dm crypt: use in_hardirq() instead of deprecated in_irq()")
c87a95dc28b1 ("dm crypt: defer decryption to a tasklet if interrupts disabled")
8e14f610159d ("dm crypt: do not call bio_endio() from the dm-crypt tasklet")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d9a02e016aaf5a57fb44e9a5e6da8ccd3b9e2e70 Mon Sep 17 00:00:00 2001
From: Mike Snitzer <snitzer(a)kernel.org>
Date: Wed, 8 Mar 2023 14:39:54 -0500
Subject: [PATCH] dm crypt: avoid accessing uninitialized tasklet
When neither "no_read_workqueue" nor "no_write_workqueue" are enabled,
tasklet_trylock() in crypt_dec_pending() may still return false due to
an uninitialized state, and dm-crypt will unnecessarily do io completion
in io_queue workqueue instead of current context.
Fix this by adding an 'in_tasklet' flag to dm_crypt_io struct and
initialize it to false in crypt_io_init(). Set this flag to true in
kcryptd_queue_crypt() before calling tasklet_schedule(). If set
crypt_dec_pending() will punt io completion to a workqueue.
This also nicely avoids the tasklet_trylock/unlock hack when tasklets
aren't in use.
Fixes: 8e14f610159d ("dm crypt: do not call bio_endio() from the dm-crypt tasklet")
Cc: stable(a)vger.kernel.org
Reported-by: Hou Tao <houtao1(a)huawei.com>
Suggested-by: Ignat Korchagin <ignat(a)cloudflare.com>
Reviewed-by: Ignat Korchagin <ignat(a)cloudflare.com>
Signed-off-by: Mike Snitzer <snitzer(a)kernel.org>
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index faba1be572f9..2764b4ea18a3 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -72,7 +72,9 @@ struct dm_crypt_io {
struct crypt_config *cc;
struct bio *base_bio;
u8 *integrity_metadata;
- bool integrity_metadata_from_pool;
+ bool integrity_metadata_from_pool:1;
+ bool in_tasklet:1;
+
struct work_struct work;
struct tasklet_struct tasklet;
@@ -1731,6 +1733,7 @@ static void crypt_io_init(struct dm_crypt_io *io, struct crypt_config *cc,
io->ctx.r.req = NULL;
io->integrity_metadata = NULL;
io->integrity_metadata_from_pool = false;
+ io->in_tasklet = false;
atomic_set(&io->io_pending, 0);
}
@@ -1777,14 +1780,13 @@ static void crypt_dec_pending(struct dm_crypt_io *io)
* our tasklet. In this case we need to delay bio_endio()
* execution to after the tasklet is done and dequeued.
*/
- if (tasklet_trylock(&io->tasklet)) {
- tasklet_unlock(&io->tasklet);
- bio_endio(base_bio);
+ if (io->in_tasklet) {
+ INIT_WORK(&io->work, kcryptd_io_bio_endio);
+ queue_work(cc->io_queue, &io->work);
return;
}
- INIT_WORK(&io->work, kcryptd_io_bio_endio);
- queue_work(cc->io_queue, &io->work);
+ bio_endio(base_bio);
}
/*
@@ -2233,6 +2235,7 @@ static void kcryptd_queue_crypt(struct dm_crypt_io *io)
* it is being executed with irqs disabled.
*/
if (in_hardirq() || irqs_disabled()) {
+ io->in_tasklet = true;
tasklet_init(&io->tasklet, kcryptd_crypt_tasklet, (unsigned long)&io->work);
tasklet_schedule(&io->tasklet);
return;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 1adab2922c58e7ff4fa9f0b43695079402cce876
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '1680007196183163(a)kroah.com' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
1adab2922c58 ("bus: imx-weim: fix branch condition evaluates to a garbage value")
e6cb5408289f ("bus: imx-weim: add DT overlay support for WEIM bus")
3b1261fb72c7 ("bus: imx-weim: remove incorrect __init annotations")
4a92f07816ba ("bus: imx-weim: use module_platform_driver()")
77266e722fea ("bus: imx-weim: optionally enable burst clock mode")
c7995bcb36ef ("bus: imx-weim: guard against timing configuration conflicts")
8b8cb52af34d ("bus: imx-weim: support multiple address ranges per child node")
b1a23445364d ("bus: imx-weim: drop unnecessary DT node name NULL check")
d8dfa59f5a51 ("bus: imx-weim: Remove VLA usage")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1adab2922c58e7ff4fa9f0b43695079402cce876 Mon Sep 17 00:00:00 2001
From: Ivan Bornyakov <i.bornyakov(a)metrotek.ru>
Date: Mon, 6 Mar 2023 16:25:26 +0300
Subject: [PATCH] bus: imx-weim: fix branch condition evaluates to a garbage
value
If bus type is other than imx50_weim_devtype and have no child devices,
variable 'ret' in function weim_parse_dt() will not be initialized, but
will be used as branch condition and return value. Fix this by
initializing 'ret' with 0.
This was discovered with help of clang-analyzer, but the situation is
quite possible in real life.
Fixes: 52c47b63412b ("bus: imx-weim: improve error handling upon child probe-failure")
Signed-off-by: Ivan Bornyakov <i.bornyakov(a)metrotek.ru>
Cc: stable(a)vger.kernel.org
Reviewed-by: Fabio Estevam <festevam(a)gmail.com>
Signed-off-by: Shawn Guo <shawnguo(a)kernel.org>
diff --git a/drivers/bus/imx-weim.c b/drivers/bus/imx-weim.c
index 2a6b4f676458..36d42484142a 100644
--- a/drivers/bus/imx-weim.c
+++ b/drivers/bus/imx-weim.c
@@ -204,8 +204,8 @@ static int weim_parse_dt(struct platform_device *pdev)
const struct of_device_id *of_id = of_match_device(weim_id_table,
&pdev->dev);
const struct imx_weim_devtype *devtype = of_id->data;
+ int ret = 0, have_child = 0;
struct device_node *child;
- int ret, have_child = 0;
struct weim_priv *priv;
void __iomem *base;
u32 reg;