Hi,
Please cherry-pick following 6 patches to 6.14:
bc3e5f48b7ee021371dc37297678f7089be6ce28 accel/ivpu: Use workqueue for IRQ handling
0240fa18d247c99a1967f2fed025296a89a1c5f5 accel/ivpu: Dump only first MMU fault from single context
4480912f3f8b8a1fbb5ae12c5c547fd094ec4197 accel/ivpu: Move parts of MMU event IRQ handling to thread handler
353b8f48390d36b39276ff6af61464ec64cd4d5c accel/ivpu: Fix missing MMU events from reserved SSID
2f5bbea1807a064a1e4c1b385c8cea4f37bb4b17 accel/ivpu: Fix missing MMU events if file_priv is unbound
683e9fa1c885a0cffbc10b459a7eee9df92af1c1 accel/ivpu: Flush pending jobs of device's workqueues
These are fixing an issue where host can be overloaded with MMU faults from NPU causing other IRQs to be missed and host to be slowed down significantly.
They should apply without conflicts.
Thanks,
Jacek
Hi,
On Sun, May 18, 2025 at 06:35:28AM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> sched_ext: Fix missing rq lock in scx_bpf_cpuperf_set()
>
> to the 6.14-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> sched_ext-fix-missing-rq-lock-in-scx_bpf_cpuperf_set.patch
> and it can be found in the queue-6.14 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
This requires upstream commit 18853ba782bef ("sched_ext: Track currently
locked rq").
Thanks,
-Andrea
>
>
>
> commit e0dd90f92931fd4040aee0bf75b348a402464821
> Author: Andrea Righi <arighi(a)nvidia.com>
> Date: Tue Apr 22 10:26:33 2025 +0200
>
> sched_ext: Fix missing rq lock in scx_bpf_cpuperf_set()
>
> [ Upstream commit a11d6784d7316a6c77ca9f14fb1a698ebbb3c1fb ]
>
> scx_bpf_cpuperf_set() can be used to set a performance target level on
> any CPU. However, it doesn't correctly acquire the corresponding rq
> lock, which may lead to unsafe behavior and trigger the following
> warning, due to the lockdep_assert_rq_held() check:
>
> [ 51.713737] WARNING: CPU: 3 PID: 3899 at kernel/sched/sched.h:1512 scx_bpf_cpuperf_set+0x1a0/0x1e0
> ...
> [ 51.713836] Call trace:
> [ 51.713837] scx_bpf_cpuperf_set+0x1a0/0x1e0 (P)
> [ 51.713839] bpf_prog_62d35beb9301601f_bpfland_init+0x168/0x440
> [ 51.713841] bpf__sched_ext_ops_init+0x54/0x8c
> [ 51.713843] scx_ops_enable.constprop.0+0x2c0/0x10f0
> [ 51.713845] bpf_scx_reg+0x18/0x30
> [ 51.713847] bpf_struct_ops_link_create+0x154/0x1b0
> [ 51.713849] __sys_bpf+0x1934/0x22a0
>
> Fix by properly acquiring the rq lock when possible or raising an error
> if we try to operate on a CPU that is not the one currently locked.
>
> Fixes: d86adb4fc0655 ("sched_ext: Add cpuperf support")
> Signed-off-by: Andrea Righi <arighi(a)nvidia.com>
> Acked-by: Changwoo Min <changwoo(a)igalia.com>
> Signed-off-by: Tejun Heo <tj(a)kernel.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 77cdff0d9f348..0067f540a3f0f 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -7459,13 +7459,32 @@ __bpf_kfunc void scx_bpf_cpuperf_set(s32 cpu, u32 perf)
> }
>
> if (ops_cpu_valid(cpu, NULL)) {
> - struct rq *rq = cpu_rq(cpu);
> + struct rq *rq = cpu_rq(cpu), *locked_rq = scx_locked_rq();
> + struct rq_flags rf;
> +
> + /*
> + * When called with an rq lock held, restrict the operation
> + * to the corresponding CPU to prevent ABBA deadlocks.
> + */
> + if (locked_rq && rq != locked_rq) {
> + scx_ops_error("Invalid target CPU %d", cpu);
> + return;
> + }
> +
> + /*
> + * If no rq lock is held, allow to operate on any CPU by
> + * acquiring the corresponding rq lock.
> + */
> + if (!locked_rq) {
> + rq_lock_irqsave(rq, &rf);
> + update_rq_clock(rq);
> + }
>
> rq->scx.cpuperf_target = perf;
> + cpufreq_update_util(rq, 0);
>
> - rcu_read_lock_sched_notrace();
> - cpufreq_update_util(cpu_rq(cpu), 0);
> - rcu_read_unlock_sched_notrace();
> + if (!locked_rq)
> + rq_unlock_irqrestore(rq, &rf);
> }
> }
>
From: Nicholas Piggin <npiggin(a)gmail.com>
[ commit 21a741eb75f80397e5f7d3739e24d7d75e619011 upstream ]
kexec on pseries disables AIL (reloc_on_exc), required for scv
instruction support, before other CPUs have been shut down. This means
they can execute scv instructions after AIL is disabled, which causes an
interrupt at an unexpected entry location that crashes the kernel.
Change the kexec sequence to disable AIL after other CPUs have been
brought down.
As a refresher, the real-mode scv interrupt vector is 0x17000, and the
fixed-location head code probably couldn't easily deal with implementing
such high addresses so it was just decided not to support that interrupt
at all.
Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions")
Cc: stable(a)vger.kernel.org # v5.9+
Reported-by: Sourabh Jain <sourabhjain(a)linux.ibm.com>
Closes: https://lore.kernel.org/3b4b2943-49ad-4619-b195-bc416f1d1409@linux.ibm.com
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Tested-by: Gautam Menghani <gautam(a)linux.ibm.com>
Tested-by: Sourabh Jain <sourabhjain(a)linux.ibm.com>
Link: https://msgid.link/20240625134047.298759-1-npiggin@gmail.com
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
[pSeries_machine_kexec hadn't been moved to kexec.c in v5.10, fix context accordingly]
Signed-off-by: Feng Liu <Feng.Liu3(a)windriver.com>
Signed-off-by: He Zhe <Zhe.He(a)windriver.com>
---
Verified the build test.
---
arch/powerpc/kexec/core_64.c | 11 +++++++++++
arch/powerpc/platforms/pseries/setup.c | 11 -----------
2 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 8a449b2d8715..ffc57d5a39a6 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -26,6 +26,7 @@
#include <asm/mmu.h>
#include <asm/sections.h> /* _end */
#include <asm/prom.h>
+#include <asm/setup.h>
#include <asm/smp.h>
#include <asm/hw_breakpoint.h>
#include <asm/asm-prototypes.h>
@@ -313,6 +314,16 @@ void default_machine_kexec(struct kimage *image)
if (!kdump_in_progress())
kexec_prepare_cpus();
+#ifdef CONFIG_PPC_PSERIES
+ /*
+ * This must be done after other CPUs have shut down, otherwise they
+ * could execute the 'scv' instruction, which is not supported with
+ * reloc disabled (see configure_exceptions()).
+ */
+ if (firmware_has_feature(FW_FEATURE_SET_MODE))
+ pseries_disable_reloc_on_exc();
+#endif
+
printk("kexec: Starting switchover sequence.\n");
/* switch to a staticly allocated stack. Based on irq stack code.
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 8e4a2e8aee11..be4d35354daf 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -409,16 +409,6 @@ void pseries_disable_reloc_on_exc(void)
}
EXPORT_SYMBOL(pseries_disable_reloc_on_exc);
-#ifdef CONFIG_KEXEC_CORE
-static void pSeries_machine_kexec(struct kimage *image)
-{
- if (firmware_has_feature(FW_FEATURE_SET_MODE))
- pseries_disable_reloc_on_exc();
-
- default_machine_kexec(image);
-}
-#endif
-
#ifdef __LITTLE_ENDIAN__
void pseries_big_endian_exceptions(void)
{
@@ -1071,7 +1061,6 @@ define_machine(pseries) {
.machine_check_early = pseries_machine_check_realmode,
.machine_check_exception = pSeries_machine_check_exception,
#ifdef CONFIG_KEXEC_CORE
- .machine_kexec = pSeries_machine_kexec,
.kexec_cpu_down = pseries_kexec_cpu_down,
#endif
#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
--
2.34.1