Check for additional CPUID bits to identify TDX guests running with Trust
Domain (TD) partitioning enabled. TD partitioning is like nested virtualization
inside the Trust Domain so there is a L1 TD VM(M) and there can be L2 TD VM(s).
In this arrangement we are not guaranteed that the TDX_CPUID_LEAF_ID is visible
to Linux running as an L2 TD VM. This is because a majority of TDX facilities
are controlled by the L1 VMM and the L2 TDX guest needs to use TD partitioning
aware mechanisms for what's left. So currently such guests do not have
X86_FEATURE_TDX_GUEST set.
We want the kernel to have X86_FEATURE_TDX_GUEST set for all TDX guests so we
need to check these additional CPUID bits, but we skip further initialization
in the function as we aren't guaranteed access to TDX module calls.
Cc: <stable(a)vger.kernel.org> # v6.5+
Signed-off-by: Jeremi Piotrowski <jpiotrowski(a)linux.microsoft.com>
---
arch/x86/coco/tdx/tdx.c | 29 ++++++++++++++++++++++++++---
arch/x86/include/asm/tdx.h | 3 +++
2 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 1d6b863c42b0..c7bbbaaf654d 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -8,6 +8,7 @@
#include <linux/export.h>
#include <linux/io.h>
#include <asm/coco.h>
+#include <asm/hyperv-tlfs.h>
#include <asm/tdx.h>
#include <asm/vmx.h>
#include <asm/insn.h>
@@ -37,6 +38,8 @@
#define TDREPORT_SUBTYPE_0 0
+bool tdx_partitioning_active;
+
/* Called from __tdx_hypercall() for unrecoverable failure */
noinstr void __tdx_hypercall_failed(void)
{
@@ -757,19 +760,38 @@ static bool tdx_enc_status_change_finish(unsigned long vaddr, int numpages,
return true;
}
+
+static bool early_is_hv_tdx_partitioning(void)
+{
+ u32 eax, ebx, ecx, edx;
+ cpuid(HYPERV_CPUID_ISOLATION_CONFIG, &eax, &ebx, &ecx, &edx);
+ return eax & HV_PARAVISOR_PRESENT &&
+ (ebx & HV_ISOLATION_TYPE) == HV_ISOLATION_TYPE_TDX;
+}
+
void __init tdx_early_init(void)
{
u64 cc_mask;
u32 eax, sig[3];
cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]);
-
- if (memcmp(TDX_IDENT, sig, sizeof(sig)))
- return;
+ if (memcmp(TDX_IDENT, sig, sizeof(sig))) {
+ tdx_partitioning_active = early_is_hv_tdx_partitioning();
+ if (!tdx_partitioning_active)
+ return;
+ }
setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
cc_vendor = CC_VENDOR_INTEL;
+
+ /*
+ * Need to defer cc_mask and page visibility callback initializations
+ * to a TD-partitioning aware implementation.
+ */
+ if (tdx_partitioning_active)
+ goto exit;
+
tdx_parse_tdinfo(&cc_mask);
cc_set_mask(cc_mask);
@@ -820,5 +842,6 @@ void __init tdx_early_init(void)
*/
x86_cpuinit.parallel_bringup = false;
+exit:
pr_info("Guest detected\n");
}
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 603e6d1e9d4a..fe22f8675859 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -52,6 +52,7 @@ bool tdx_early_handle_ve(struct pt_regs *regs);
int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport);
+extern bool tdx_partitioning_active;
#else
static inline void tdx_early_init(void) { };
@@ -71,6 +72,8 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
{
return -ENODEV;
}
+
+#define tdx_partitioning_active false
#endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
#endif /* !__ASSEMBLY__ */
#endif /* _ASM_X86_TDX_H */
--
2.39.2
The rtc on the mox shares its interrupt line with the moxtet bus. Set
the interrupt type to be consistent between both devices. This ensures
correct setup of the interrupt line regardless of probing order.
Signed-off-by: Sjoerd Simons <sjoerd(a)collabora.com>
Cc: stable(a)vger.kernel.org # v6.2+
Fixes: 21aad8ba615e ("arm64: dts: armada-3720-turris-mox: Add missing interrupt for RTC")
---
(no changes since v1)
arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts b/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts
index 9eab2bb22134..805ef2d79b40 100644
--- a/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts
+++ b/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts
@@ -130,7 +130,7 @@ rtc@6f {
compatible = "microchip,mcp7940x";
reg = <0x6f>;
interrupt-parent = <&gpiosb>;
- interrupts = <5 0>; /* GPIO2_5 */
+ interrupts = <5 IRQ_TYPE_EDGE_FALLING>; /* GPIO2_5 */
};
};
--
2.43.0
The Turris Mox shares the moxtet IRQ with various devices on the board,
so mark the IRQ as shared in the driver as well.
Without this loading the module will fail with:
genirq: Flags mismatch irq 40. 00002002 (moxtet) vs. 00002080 (mcp7940x)
Signed-off-by: Sjoerd Simons <sjoerd(a)collabora.com>
Cc: stable(a)vger.kernel.org # v6.2+
---
(no changes since v1)
drivers/bus/moxtet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/bus/moxtet.c b/drivers/bus/moxtet.c
index 5eb0fe73ddc4..48c18f95660a 100644
--- a/drivers/bus/moxtet.c
+++ b/drivers/bus/moxtet.c
@@ -755,7 +755,7 @@ static int moxtet_irq_setup(struct moxtet *moxtet)
moxtet->irq.masked = ~0;
ret = request_threaded_irq(moxtet->dev_irq, NULL, moxtet_irq_thread_fn,
- IRQF_ONESHOT, "moxtet", moxtet);
+ IRQF_SHARED | IRQF_ONESHOT, "moxtet", moxtet);
if (ret < 0)
goto err_free;
--
2.43.0
A refcount issue can appeared in __fwnode_link_del() due to the
pr_debug() call:
WARNING: CPU: 0 PID: 901 at lib/refcount.c:25 refcount_warn_saturate+0xe5/0x110
Call Trace:
<TASK>
...
of_node_get+0x1e/0x30
of_fwnode_get+0x28/0x40
fwnode_full_name_string+0x34/0x90
fwnode_string+0xdb/0x140
...
vsnprintf+0x17b/0x630
...
__fwnode_link_del+0x25/0xa0
fwnode_links_purge+0x39/0xb0
of_node_release+0xd9/0x180
...
Indeed, an fwnode (of_node) is being destroyed and so, of_node_release()
is called because the of_node refcount reached 0.
From of_node_release() several function calls are done and lead to
a pr_debug() calls with %pfwf to print the fwnode full name.
The issue is not present if we change %pfwf to %pfwP.
To print the full name, %pfwf iterates over the current node and its
parents and obtain/drop a reference to all nodes involved.
In order to allow to print the full name (%pfwf) of a node while it is
being destroyed, do not obtain/drop a reference to this current node.
Fixes: a92eb7621b9f ("lib/vsprintf: Make use of fwnode API to obtain node names and separators")
Cc: stable(a)vger.kernel.org
Signed-off-by: Herve Codina <herve.codina(a)bootlin.com>
Reviewed-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
Changes v2 -> v3
- Fix typo in comment ("ie parents node" -> "i.e. parent nodes")
- Add 'Reviewed-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>'
- Add 'Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>'
Changes v1 -> v2
- Avoid handling current node out of the loop. Instead obtain/drop references
in the loop based on the depth value.
- Remove some of the backtrace lines in the commit log.
lib/vsprintf.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index afb88b24fa74..2aa408441cd3 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2110,15 +2110,20 @@ char *fwnode_full_name_string(struct fwnode_handle *fwnode, char *buf,
/* Loop starting from the root node to the current node. */
for (depth = fwnode_count_parents(fwnode); depth >= 0; depth--) {
- struct fwnode_handle *__fwnode =
- fwnode_get_nth_parent(fwnode, depth);
+ /*
+ * Only get a reference for other nodes (i.e. parent nodes).
+ * fwnode refcount may be 0 here.
+ */
+ struct fwnode_handle *__fwnode = depth ?
+ fwnode_get_nth_parent(fwnode, depth) : fwnode;
buf = string(buf, end, fwnode_get_name_prefix(__fwnode),
default_str_spec);
buf = string(buf, end, fwnode_get_name(__fwnode),
default_str_spec);
- fwnode_handle_put(__fwnode);
+ if (depth)
+ fwnode_handle_put(__fwnode);
}
return buf;
--
2.41.0
Hi all,
This series fixes some long-term issues in kernel that preventing
some machine from work properly.
Hopefully that will rescue some system in wild :-)
Thanks
Signed-off-by: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
---
Changes in v2:
- Typo and style fixes
- Link to v1: https://lore.kernel.org/r/20231101-loongson64_fixes-v1-0-2a2582a4bfa9@flygo…
---
Jiaxun Yang (3):
MIPS: Loongson64: Reserve vgabios memory on boot
MIPS: Loongson64: Enable DMA noncoherent support
MIPS: Loongson64: Handle more memory types passed from firmware
arch/mips/Kconfig | 2 +
arch/mips/include/asm/mach-loongson64/boot_param.h | 9 ++++-
arch/mips/loongson64/env.c | 10 ++++-
arch/mips/loongson64/init.c | 47 ++++++++++++++--------
4 files changed, 49 insertions(+), 19 deletions(-)
---
base-commit: 9c2d379d63450ae464eeab45462e0cb573cd97d0
change-id: 20231101-loongson64_fixes-0afb1b503d1e
Best regards,
--
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
io_uring sets up the io worker kernel thread via a syscall out of an
user space prrocess. This process might have used FPU and since
copy_thread() didn't clear FPU states for kernel threads a BUG()
is triggered for using FPU inside kernel. Move code around
to always clear FPU state for user and kernel threads.
Cc: stable(a)vger.kernel.org
Reported-by: Aurelien Jarno <aurel32(a)debian.org>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1055021
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
---
arch/mips/kernel/process.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index 5387ed0a5186..b630604c577f 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -121,6 +121,19 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
/* Put the stack after the struct pt_regs. */
childksp = (unsigned long) childregs;
p->thread.cp0_status = (read_c0_status() & ~(ST0_CU2|ST0_CU1)) | ST0_KERNEL_CUMASK;
+
+ /*
+ * New tasks lose permission to use the fpu. This accelerates context
+ * switching for most programs since they don't use the fpu.
+ */
+ clear_tsk_thread_flag(p, TIF_USEDFPU);
+ clear_tsk_thread_flag(p, TIF_USEDMSA);
+ clear_tsk_thread_flag(p, TIF_MSA_CTX_LIVE);
+
+#ifdef CONFIG_MIPS_MT_FPAFF
+ clear_tsk_thread_flag(p, TIF_FPUBOUND);
+#endif /* CONFIG_MIPS_MT_FPAFF */
+
if (unlikely(args->fn)) {
/* kernel thread */
unsigned long status = p->thread.cp0_status;
@@ -149,20 +162,8 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
p->thread.reg29 = (unsigned long) childregs;
p->thread.reg31 = (unsigned long) ret_from_fork;
- /*
- * New tasks lose permission to use the fpu. This accelerates context
- * switching for most programs since they don't use the fpu.
- */
childregs->cp0_status &= ~(ST0_CU2|ST0_CU1);
- clear_tsk_thread_flag(p, TIF_USEDFPU);
- clear_tsk_thread_flag(p, TIF_USEDMSA);
- clear_tsk_thread_flag(p, TIF_MSA_CTX_LIVE);
-
-#ifdef CONFIG_MIPS_MT_FPAFF
- clear_tsk_thread_flag(p, TIF_FPUBOUND);
-#endif /* CONFIG_MIPS_MT_FPAFF */
-
#ifdef CONFIG_MIPS_FP_SUPPORT
atomic_set(&p->thread.bd_emu_frame, BD_EMUFRAME_NONE);
#endif
--
2.35.3
Hi, all
We are encountering a perf related soft lockup as shown below:
[25023823.265138] watchdog: BUG: soft lockup - CPU#29 stuck for 45s!
[YD:3284696]
[25023823.275772] net_failover virtio_scsi failover
[25023823.276750] CPU: 29 PID: 3284696 Comm: YD Kdump: loaded Not
tainted 4.19.90-23.18.v2101.ky10.aarch64 #1
[25023823.278257] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[25023823.279475] pstate: 80400005 (Nzcv daif +PAN -UAO)
[25023823.280516] pc : perf_iterate_sb+0x1b8/0x1f0
[25023823.281530] lr : perf_iterate_sb+0x18c/0x1f0
[25023823.282529] sp : ffff801f282efbf0
[25023823.283446] x29: ffff801f282efbf0 x28: ffff801f207a8b80
[25023823.284551] x27: 0000000000000000 x26: ffff801f99b355e8
[25023823.285674] x25: 0000000000000000 x24: ffff8019e2fbd800
[25023823.286770] x23: ffff0000093f0018 x22: ffff801f282efc40
[25023823.287864] x21: ffff000008255f60 x20: ffff801ffdf58e80
[25023823.288964] x19: ffff8019f1c27800 x18: 0000000000000000
[25023823.290060] x17: 0000000000000000 x16: 0000000000000000
[25023823.291164] x15: 0400000000000000 x14: 0000000000000000
[25023823.292266] x13: ffff000008c6e340 x12: 0000000000000002
[25023823.293381] x11: ffff000008c6e318 x10: 00000019e5feff20
[25023823.294486] x9 : ffff8019fb49c000 x8 : 0058e6fd335b260e
[25023823.295597] x7 : 0000000100321ed8 x6 : ffff00003d083780
[25023823.296715] x5 : 00ffffffffffffff x4 : 0000801ff4ae0000
[25023823.297860] x3 : ffff801ffdf64cc0 x2 : ffff000009858758
[25023823.298977] x1 : 0000000000000000 x0 : ffff8019e2fbd800
[25023823.300090] Call trace:
[25023823.300962] perf_iterate_sb+0x1b8/0x1f0
[25023823.301961] perf_event_task+0x78/0x80
[25023823.302946] perf_event_exit_task+0xa4/0xb0
[25023823.303978] do_exit+0x38c/0x5d0
[25023823.304932] do_group_exit+0x3c/0xd8
[25023823.305904] get_signal+0x12c/0x740
[25023823.306859] do_signal+0x158/0x260
[25023823.307795] do_notify_resume+0xd8/0x358
[25023823.308781] work_pending+0x8/0x10
We got a vmcore by enable panic_on_soft_lockup, from the vmcore we
found the perf_event accessed through
perf_iterate_sb -> perf_iterate_sb_cpu -> event_filter_match ->
pmu_filter_match -> for_each_sibling_event
had been removed:
#define for_each_sibling_event(sibling, event) \
if ((event)->group_leader == (event)) \
list_for_each_entry((sibling), &(event)->sibling_list,
sibling_list)
#define list_for_each_entry(pos, head, member) \
for (pos = __container_of((head)->next, pos, member); \
&pos->member != (head); \
pos = __container_of(pos->member.next, pos, member))
crash> struct perf_event ffff8019e2fbd800
struct perf_event {
event_entry = {
next = 0xffff8019f1c27800,
prev = 0xdead000000000200
},
...
state = PERF_EVENT_STATE_DEAD,
...
}
By the way, we also found another process which is deleting sibling_list:
crash> bt 3284533
PID: 3284533 TASK: ffff801f901ae880 CPU: 16 COMMAND: "YD"
#0 [ffff801f8cd977f0] __switch_to at ffff000008088ba4
#1 [ffff801f8cd97810] __schedule at ffff000008bf10c4
#2 [ffff801f8cd97890] schedule at ffff000008bf17b0
#3 [ffff801f8cd978a0] schedule_timeout at ffff000008bf5b10
#4 [ffff801f8cd97960] wait_for_common at ffff000008bf2530
#5 [ffff801f8cd979f0] wait_for_completion at ffff000008bf2644
#6 [ffff801f8cd97a10] __wait_rcu_gp at ffff000008171c00
#7 [ffff801f8cd97a80] synchronize_sched at ffff000008179da8
#8 [ffff801f8cd97ad0] perf_trace_event_unreg at ffff000008216d50
#9 [ffff801f8cd97b00] perf_trace_destroy at ffff000008217148
#10 [ffff801f8cd97b20] tp_perf_event_destroy at ffff000008256ae0
#11 [ffff801f8cd97b30] _free_event at ffff00000825f21c
#12 [ffff801f8cd97b70] put_event at ffff00000825faf0
#13 [ffff801f8cd97b80] perf_event_release_kernel at ffff00000825fcb8
#14 [ffff801f8cd97be0] perf_release at ffff00000825fdbc
#15 [ffff801f8cd97bf0] __fput at ffff00000832f0b8
#16 [ffff801f8cd97c30] ____fput at ffff00000832f28c
#17 [ffff801f8cd97c50] task_work_run at ffff00000810f8c8
#18 [ffff801f8cd97c90] do_exit at ffff0000080ef458
#19 [ffff801f8cd97cf0] do_group_exit at ffff0000080ef738
#20 [ffff801f8cd97d20] get_signal at ffff0000080fdde0
#21 [ffff801f8cd97d90] do_signal at ffff00000808e488
#22 [ffff801f8cd97e80] do_notify_resume at ffff00000808e7f4
#23 [ffff801f8cd97ff0] work_pending at ffff000008083f60
So it's reasonable to suspect that perf_iterate_sb is traversing
sibling_list while another
process is deleting it which eventually caused for_each_sibling_event
to endless loop and thus soft lockup.
The race scenario thus could be this:
CPU 29: CPU 16:
perf_event_release_kernel
--> mutex_lock(&ctx->mutex)
--> perf_remove_from_context
--> perf_group_detach(event);
for_each_sibling_event() -->
list_del_init(&event->sibling_list)
As commit f3c0eba287049(“perf: Add a few assertions”)said:
“Notable for_each_sibling_event() relies on exclusion from
modification. This would normally be holding either ctx->lock or
ctx->mutex, however due to how things are constructed disabling IRQs
is a valid and sufficient substitute for ctx->lock.”, we think it's
necessary to hold ctx ->mutex, but currently LTS such as 4.19,5.4,5.10,
and 6.1 all does not do so:
perf_event_task
--> perf_iterate_sb
--> perf_iterate_sb_cpu
--> event_filter_match
--> pmu_filter_match
--> for_each_sibling_event
commit bd27568117664(“perf: Rewrite core context handling”)had removed
the pmu_filter_match operation, so it may be a temporary workaround
for this issue.
But it's necessary to confirm if there is a race problem between
sibling_list, and if it is, how
to fix currently LTS branches.
Thanks in advance.