Hi,
I have discovered a 100% reliable soft lockup on boot on my laptop:
Purism Librem 14, Intel Core i7-10710U, 48Gb RAM, Samsung Evo Plus 970
SSD, CoreBoot BIOS, grub bootloader, Arch Linux.
The last working release is kernel 6.9.10, every release from 6.10
onwards reliably exhibit the issue, which, based on journalctl logs,
seems to be triggered somewhere in systemd-udev:
https://gitlab.archlinux.org/-/project/42594/uploads/04583baf22189a0a8bb2f8…
Bisect points to commit 5186ba33234c9a90833f7c93ce7de80e25fac6f5
At a glance, I see two potentially problematic changes in this diff.
Specifically, in the refactoring to move the call to rdt_ctrl_update
inside the loop that walks over r->domains :
1. the change from on_each_cpu_mask to smp_call_function_any means
that preemption is no longer disabled around the call to
rdt_ctrl_update, which could plausibly be a problem
2. there's now a race condition on the msr_params struct: afaict
there's no write barrier, so if the call to rdt_ctrl_update is
executed on a different CPU, it could plausibly read an outdated value
of the dom field, which prior to this series of patches wasn't passed
as an explicit parameter, but derived inside rdt_ctrl_update
For initial report to Arch Linux bugtracker and bisect log see:
https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/74
Best
Hugues
From: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
dsa_switch_shutdown() doesn't bring down any ports, but only disconnects
slaves from master. Packets still come afterwards into master port and the
ports are being polled for link status. This leads to crashes:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
CPU: 0 PID: 442 Comm: kworker/0:3 Tainted: G O 6.1.99+ #1
Workqueue: events_power_efficient phy_state_machine
pc : lan9303_mdio_phy_read
lr : lan9303_phy_read
Call trace:
lan9303_mdio_phy_read
lan9303_phy_read
dsa_slave_phy_read
__mdiobus_read
mdiobus_read
genphy_update_link
genphy_read_status
phy_check_link_status
phy_state_machine
process_one_work
worker_thread
Call lan9303_remove() instead to really unregister all ports before zeroing
drvdata and dsa_ptr.
Fixes: 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
---
drivers/net/dsa/lan9303-core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 268949939636..ecd507355f51 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -1477,7 +1477,7 @@ EXPORT_SYMBOL(lan9303_remove);
void lan9303_shutdown(struct lan9303 *chip)
{
- dsa_switch_shutdown(chip->ds);
+ lan9303_remove(chip);
}
EXPORT_SYMBOL(lan9303_shutdown);
--
2.46.0
From: "Alexey Gladkov (Intel)" <legion(a)kernel.org>
TDX only supports kernel-initiated MMIO operations. The handle_mmio()
function checks if the #VE exception occurred in the kernel and rejects
the operation if it did not.
However, userspace can deceive the kernel into performing MMIO on its
behalf. For example, if userspace can point a syscall to an MMIO address,
syscall does get_user() or put_user() on it, triggering MMIO #VE. The
kernel will treat the #VE as in-kernel MMIO.
Ensure that the target MMIO address is within the kernel before decoding
instruction.
Cc: stable(a)vger.kernel.org
Reviewed-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Signed-off-by: Alexey Gladkov (Intel) <legion(a)kernel.org>
---
arch/x86/coco/tdx/tdx.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 078e2bac2553..d6e6407e3999 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -16,6 +16,7 @@
#include <asm/insn-eval.h>
#include <asm/pgtable.h>
#include <asm/set_memory.h>
+#include <asm/traps.h>
/* MMIO direction */
#define EPT_READ 0
@@ -434,6 +435,11 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
return -EINVAL;
}
+ if (!fault_in_kernel_space(ve->gla)) {
+ WARN_ONCE(1, "Access to userspace address is not supported");
+ return -EINVAL;
+ }
+
/*
* Reject EPT violation #VEs that split pages.
*
--
2.46.0
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Since drm_sched_entity_modify_sched() can modify the entities run queue,
lets make sure to only dereference the pointer once so both adding and
waking up are guaranteed to be consistent.
Alternative of moving the spin_unlock to after the wake up would for now
be more problematic since the same lock is taken inside
drm_sched_rq_update_fifo().
v2:
* Improve commit message. (Philipp)
* Cache the scheduler pointer directly. (Christian)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Luben Tuikov <ltuikov89(a)gmail.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: Philipp Stanner <pstanner(a)redhat.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.7+
Reviewed-by: Christian König <christian.koenig(a)amd.com>
---
drivers/gpu/drm/scheduler/sched_entity.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index ae8be30472cd..76e422548d40 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
/* first job wakes up scheduler */
if (first) {
+ struct drm_gpu_scheduler *sched;
+ struct drm_sched_rq *rq;
+
/* Add the entity to the run queue */
spin_lock(&entity->rq_lock);
if (entity->stopped) {
@@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
return;
}
- drm_sched_rq_add_entity(entity->rq, entity);
+ rq = entity->rq;
+ sched = rq->sched;
+
+ drm_sched_rq_add_entity(rq, entity);
spin_unlock(&entity->rq_lock);
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
drm_sched_rq_update_fifo(entity, submit_ts);
- drm_sched_wakeup(entity->rq->sched, entity);
+ drm_sched_wakeup(sched, entity);
}
}
EXPORT_SYMBOL(drm_sched_entity_push_job);
--
2.46.0
Disabling CRYPTO_AES_GCM_P10 in Kconfig first so that we can apply the
subsequent patches to fix data mismatch over ipsec tunnel.
Signed-off-by: Danny Tsen <dtsen(a)linux.ibm.com>
---
arch/powerpc/crypto/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/crypto/Kconfig b/arch/powerpc/crypto/Kconfig
index 09ebcbdfb34f..46a4c85e85e2 100644
--- a/arch/powerpc/crypto/Kconfig
+++ b/arch/powerpc/crypto/Kconfig
@@ -107,6 +107,7 @@ config CRYPTO_AES_PPC_SPE
config CRYPTO_AES_GCM_P10
tristate "Stitched AES/GCM acceleration support on P10 or later CPU (PPC)"
+ depends on BROKEN
depends on PPC64 && CPU_LITTLE_ENDIAN && VSX
select CRYPTO_LIB_AES
select CRYPTO_ALGAPI
--
2.43.0
[ Upstream commit 858430db28a5f5a11f8faa3a6fa805438e6f0851 ]
axienet_dma_err_handler can race with axienet_stop in the following
manner:
CPU 1 CPU 2
====================== ==================
axienet_stop()
napi_disable()
axienet_dma_stop()
axienet_dma_err_handler()
napi_disable()
axienet_dma_stop()
axienet_dma_start()
napi_enable()
cancel_work_sync()
free_irq()
Fix this by setting a flag in axienet_stop telling
axienet_dma_err_handler not to bother doing anything. I chose not to use
disable_work_sync to allow for easier backporting.
Signed-off-by: Sean Anderson <sean.anderson(a)linux.dev>
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Link: https://patch.msgid.link/20240903175141.4132898-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
[ Adjusted to apply before dmaengine support ]
Signed-off-by: Sean Anderson <sean.anderson(a)linux.dev>
---
This patch is adjusted to apply to v6.6 kernels and earlier, which do
not contain commit 6a91b846af85 ("net: axienet: Introduce dmaengine
support").
drivers/net/ethernet/xilinx/xilinx_axienet.h | 3 +++
drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 8 ++++++++
2 files changed, 11 insertions(+)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h
index f09f10f17d7e..2facbdfbb319 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet.h
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h
@@ -419,6 +419,8 @@ struct axidma_bd {
* @tx_bytes: TX byte count for statistics
* @tx_stat_sync: Synchronization object for TX stats
* @dma_err_task: Work structure to process Axi DMA errors
+ * @stopping: Set when @dma_err_task shouldn't do anything because we are
+ * about to stop the device.
* @tx_irq: Axidma TX IRQ number
* @rx_irq: Axidma RX IRQ number
* @eth_irq: Ethernet core IRQ number
@@ -481,6 +483,7 @@ struct axienet_local {
struct u64_stats_sync tx_stat_sync;
struct work_struct dma_err_task;
+ bool stopping;
int tx_irq;
int rx_irq;
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
index 144feb7a2fda..65d7aaad43fe 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -1162,6 +1162,7 @@ static int axienet_open(struct net_device *ndev)
phylink_start(lp->phylink);
/* Enable worker thread for Axi DMA error handling */
+ lp->stopping = false;
INIT_WORK(&lp->dma_err_task, axienet_dma_err_handler);
napi_enable(&lp->napi_rx);
@@ -1217,6 +1218,9 @@ static int axienet_stop(struct net_device *ndev)
dev_dbg(&ndev->dev, "axienet_close()\n");
+ WRITE_ONCE(lp->stopping, true);
+ flush_work(&lp->dma_err_task);
+
napi_disable(&lp->napi_tx);
napi_disable(&lp->napi_rx);
@@ -1761,6 +1765,10 @@ static void axienet_dma_err_handler(struct work_struct *work)
dma_err_task);
struct net_device *ndev = lp->ndev;
+ /* Don't bother if we are going to stop anyway */
+ if (READ_ONCE(lp->stopping))
+ return;
+
napi_disable(&lp->napi_tx);
napi_disable(&lp->napi_rx);
--
2.35.1.1320.gc452695387.dirty
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 7f828d5fff7d24752e1ecf6bebb6617a81f97b93
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024091354-stock-suspend-e1ee@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
7f828d5fff7d ("clocksource: hyper-v: Use lapic timer in a TDX VM without paravisor")
b967df629351 ("hyperv-tlfs: Rename some HV_REGISTER_* defines for consistency")
0e3f7d120086 ("hyperv-tlfs: Change prefix of generic HV_REGISTER_* MSRs to HV_MSR_*")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7f828d5fff7d24752e1ecf6bebb6617a81f97b93 Mon Sep 17 00:00:00 2001
From: Dexuan Cui <decui(a)microsoft.com>
Date: Thu, 20 Jun 2024 23:16:14 -0700
Subject: [PATCH] clocksource: hyper-v: Use lapic timer in a TDX VM without
paravisor
In a TDX VM without paravisor, currently the default timer is the Hyper-V
timer, which depends on the slow VM Reference Counter MSR: the Hyper-V TSC
page is not enabled in such a VM because the VM uses Invariant TSC as a
better clocksource and it's challenging to mark the Hyper-V TSC page shared
in very early boot.
Lower the rating of the Hyper-V timer so the local APIC timer becomes the
the default timer in such a VM, and print a warning in case Invariant TSC
is unavailable in such a VM. This change should cause no perceivable
performance difference.
Cc: stable(a)vger.kernel.org # 6.6+
Reviewed-by: Roman Kisel <romank(a)linux.microsoft.com>
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mhklinux(a)outlook.com>
Link: https://lore.kernel.org/r/20240621061614.8339-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
Message-ID: <20240621061614.8339-1-decui(a)microsoft.com>
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e0fd57a8ba84..954b7cbfa2f0 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -449,9 +449,23 @@ static void __init ms_hyperv_init_platform(void)
ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
if (!ms_hyperv.paravisor_present) {
- /* To be supported: more work is required. */
+ /*
+ * Mark the Hyper-V TSC page feature as disabled
+ * in a TDX VM without paravisor so that the
+ * Invariant TSC, which is a better clocksource
+ * anyway, is used instead.
+ */
ms_hyperv.features &= ~HV_MSR_REFERENCE_TSC_AVAILABLE;
+ /*
+ * The Invariant TSC is expected to be available
+ * in a TDX VM without paravisor, but if not,
+ * print a warning message. The slower Hyper-V MSR-based
+ * Ref Counter should end up being the clocksource.
+ */
+ if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
+ pr_warn("Hyper-V: Invariant TSC is unavailable\n");
+
/* HV_MSR_CRASH_CTL is unsupported. */
ms_hyperv.misc_features &= ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index b2a080647e41..99177835cade 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -137,7 +137,21 @@ static int hv_stimer_init(unsigned int cpu)
ce->name = "Hyper-V clockevent";
ce->features = CLOCK_EVT_FEAT_ONESHOT;
ce->cpumask = cpumask_of(cpu);
- ce->rating = 1000;
+
+ /*
+ * Lower the rating of the Hyper-V timer in a TDX VM without paravisor,
+ * so the local APIC timer (lapic_clockevent) is the default timer in
+ * such a VM. The Hyper-V timer is not preferred in such a VM because
+ * it depends on the slow VM Reference Counter MSR (the Hyper-V TSC
+ * page is not enbled in such a VM because the VM uses Invariant TSC
+ * as a better clocksource and it's challenging to mark the Hyper-V
+ * TSC page shared in very early boot).
+ */
+ if (!ms_hyperv.paravisor_present && hv_isolation_type_tdx())
+ ce->rating = 90;
+ else
+ ce->rating = 1000;
+
ce->set_state_shutdown = hv_ce_shutdown;
ce->set_state_oneshot = hv_ce_set_oneshot;
ce->set_next_event = hv_ce_set_next_event;