We started hitting the same netdevice refcount leak that had been
reported and fixed in mainline, and that Olivier had backported into
v5.15.142. So this is the same series further backported to the
latest v5.10 which fixes the issues we were seeing there.
This version also applies without change on v5.4 which should also
be affected, but it was only build-tested there as we don't have an
easy way to run our tests on that version.
Reference: https://lore.kernel.org/stable/20231201133004.3853933-1-olivier.matz@6wind.…
Xin Long (2):
vlan: introduce vlan_dev_free_egress_priority
vlan: move dev_put into vlan_dev_uninit
net/8021q/vlan.h | 2 +-
net/8021q/vlan_dev.c | 15 +++++++++++----
net/8021q/vlan_netlink.c | 7 ++++---
3 files changed, 16 insertions(+), 8 deletions(-)
--
2.34.1
device_del() can lead to new work being scheduled in gadget->work
workqueue. This is observed, for example, with the dwc3 driver with the
following call stack:
device_del()
gadget_unbind_driver()
usb_gadget_disconnect_locked()
dwc3_gadget_pullup()
dwc3_gadget_soft_disconnect()
usb_gadget_set_state()
schedule_work(&gadget->work)
Move flush_work() after device_del() to ensure the workqueue is cleaned
up.
Fixes: 5702f75375aa9 ("usb: gadget: udc-core: move sysfs_notify() to a workqueue")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Roy Luo <royluo(a)google.com>
---
drivers/usb/gadget/udc/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
index a6f46364be65..4b3d5075621a 100644
--- a/drivers/usb/gadget/udc/core.c
+++ b/drivers/usb/gadget/udc/core.c
@@ -1543,8 +1543,8 @@ void usb_del_gadget(struct usb_gadget *gadget)
kobject_uevent(&udc->dev.kobj, KOBJ_REMOVE);
sysfs_remove_link(&udc->dev.kobj, "gadget");
- flush_work(&gadget->work);
device_del(&gadget->dev);
+ flush_work(&gadget->work);
ida_free(&gadget_id_numbers, gadget->id_number);
cancel_work_sync(&udc->vbus_work);
device_unregister(&udc->dev);
base-commit: f286757b644c226b6b31779da95a4fa7ab245ef5
--
2.48.1.362.g079036d154-goog
On destroy, we should set each node dead. But current code miss this
when the maple tree has only the root node.
The reason is mt_destroy_walk() leverage mte_destroy_descend() to set
node dead, but this is skipped since the only root node is a leaf.
This patch fixes this by setting the root dead before mt_destroy_walk().
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
CC: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
---
lib/maple_tree.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 198c14dd3377..d31f0a2858f7 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5347,6 +5347,8 @@ static inline void mte_destroy_walk(struct maple_enode *enode,
{
struct maple_node *node = mte_to_node(enode);
+ mte_set_node_dead(enode);
+
if (mt_in_rcu(mt)) {
mt_destroy_walk(enode, mt, false);
call_rcu(&node->rcu, mt_free_walk);
--
2.34.1
Direct HLT instruction execution causes #VEs for TDX VMs which is routed
to hypervisor via TDCALL. safe_halt() routines execute HLT in STI-shadow
so IRQs need to remain disabled until the TDCALL to ensure that pending
IRQs are correctly treated as wake events. So "sti;hlt" sequence needs to
be replaced with "TDCALL; raw_local_irq_enable()" for TDX VMs.
Commit bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
prevented the idle routines from using "sti;hlt". But it missed the
paravirt routine which can be reached like this as an example:
acpi_safe_halt() =>
raw_safe_halt() =>
arch_safe_halt() =>
irq.safe_halt() =>
pv_native_safe_halt()
Modify tdx_safe_halt() to implement the sequence "TDCALL;
raw_local_irq_enable()" and invoke tdx_halt() from idle routine which just
executes TDCALL without changing state of interrupts.
If CONFIG_PARAVIRT_XXL is disabled, "sti;hlt" sequences can still get
executed from TDX VMs via paths like:
acpi_safe_halt() =>
raw_safe_halt() =>
arch_safe_halt() =>
native_safe_halt()
There is a long term plan to fix these paths by carving out
irq.safe_halt() outside paravirt framework.
Cc: stable(a)vger.kernel.org
Fixes: bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
Reviewed-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>@linux.intel.com>
Signed-off-by: Vishal Annapurve <vannapurve(a)google.com>
---
Changes since V2:
1) Addressed comments from Dave H and Kirill S.
V2: https://lore.kernel.org/lkml/20250129232525.3519586-1-vannapurve@google.com/
arch/x86/coco/tdx/tdx.c | 18 +++++++++++++++++-
arch/x86/include/asm/tdx.h | 2 +-
arch/x86/kernel/process.c | 2 +-
3 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 32809a06dab4..5e68758666a4 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -14,6 +14,7 @@
#include <asm/ia32.h>
#include <asm/insn.h>
#include <asm/insn-eval.h>
+#include <asm/paravirt_types.h>
#include <asm/pgtable.h>
#include <asm/set_memory.h>
#include <asm/traps.h>
@@ -398,7 +399,7 @@ static int handle_halt(struct ve_info *ve)
return ve_instr_len(ve);
}
-void __cpuidle tdx_safe_halt(void)
+void __cpuidle tdx_halt(void)
{
const bool irq_disabled = false;
@@ -409,6 +410,12 @@ void __cpuidle tdx_safe_halt(void)
WARN_ONCE(1, "HLT instruction emulation failed\n");
}
+static void __cpuidle tdx_safe_halt(void)
+{
+ tdx_halt();
+ raw_local_irq_enable();
+}
+
static int read_msr(struct pt_regs *regs, struct ve_info *ve)
{
struct tdx_module_args args = {
@@ -1109,6 +1116,15 @@ void __init tdx_early_init(void)
x86_platform.guest.enc_kexec_begin = tdx_kexec_begin;
x86_platform.guest.enc_kexec_finish = tdx_kexec_finish;
+#ifdef CONFIG_PARAVIRT_XXL
+ /*
+ * Avoid the literal hlt instruction in TDX guests. hlt will
+ * induce a #VE in the STI-shadow which will enable interrupts
+ * in a place where they are not wanted.
+ */
+ pv_ops.irq.safe_halt = tdx_safe_halt;
+#endif
+
/*
* TDX intercepts the RDMSR to read the X2APIC ID in the parallel
* bringup low level code. That raises #VE which cannot be handled
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index b4b16dafd55e..393ee2dfaab1 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -58,7 +58,7 @@ void tdx_get_ve_info(struct ve_info *ve);
bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve);
-void tdx_safe_halt(void);
+void tdx_halt(void);
bool tdx_early_handle_ve(struct pt_regs *regs);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 6da6769d7254..d11956a178df 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -934,7 +934,7 @@ void __init select_idle_routine(void)
static_call_update(x86_idle, mwait_idle);
} else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
pr_info("using TDX aware idle routine\n");
- static_call_update(x86_idle, tdx_safe_halt);
+ static_call_update(x86_idle, tdx_halt);
} else {
static_call_update(x86_idle, default_idle);
}
--
2.48.1.502.g6dc24dfdaf-goog
The patch titled
Subject: kasan: don't call find_vm_area() in RT kernel
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kasan-dont-call-find_vm_area-in-rt-kernel.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Waiman Long <longman(a)redhat.com>
Subject: kasan: don't call find_vm_area() in RT kernel
Date: Tue, 11 Feb 2025 11:07:50 -0500
The following bug report appeared with a test run in a RT debug kernel.
[ 3359.353842] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[ 3359.353848] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 140605, name: kunit_try_catch
[ 3359.353853] preempt_count: 1, expected: 0
:
[ 3359.353933] Call trace:
:
[ 3359.353955] rt_spin_lock+0x70/0x140
[ 3359.353959] find_vmap_area+0x84/0x168
[ 3359.353963] find_vm_area+0x1c/0x50
[ 3359.353966] print_address_description.constprop.0+0x2a0/0x320
[ 3359.353972] print_report+0x108/0x1f8
[ 3359.353976] kasan_report+0x90/0xc8
[ 3359.353980] __asan_load1+0x60/0x70
print_address_description() is run with a raw_spinlock_t acquired and
interrupt disabled. find_vm_area() needs to acquire a spinlock_t which
becomes a sleeping lock in the RT kernel. IOW, we can't call
find_vm_area() in a RT kernel. Fix this bug report by skipping the
find_vm_area() call in this case and just print out the address as is.
For !RT kernel, follow the example set in commit 0cce06ba859a
("debugobjects,locking: Annotate debug_object_fill_pool() wait type
violation") and use DEFINE_WAIT_OVERRIDE_MAP() to avoid a spinlock_t
inside raw_spinlock_t warning.
Link: https://lkml.kernel.org/r/20250211160750.1301353-1-longman@redhat.com
Fixes: c056a364e954 ("kasan: print virtual mapping info in reports")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Dmitriy Vyukov <dvyukov(a)google.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Mariano Pache <npache(a)redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/report.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
--- a/mm/kasan/report.c~kasan-dont-call-find_vm_area-in-rt-kernel
+++ a/mm/kasan/report.c
@@ -398,9 +398,20 @@ static void print_address_description(vo
pr_err("\n");
}
- if (is_vmalloc_addr(addr)) {
- struct vm_struct *va = find_vm_area(addr);
+ if (!is_vmalloc_addr(addr))
+ goto print_page;
+ /*
+ * RT kernel cannot call find_vm_area() in atomic context.
+ * For !RT kernel, prevent spinlock_t inside raw_spinlock_t warning
+ * by raising wait-type to WAIT_SLEEP.
+ */
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ static DEFINE_WAIT_OVERRIDE_MAP(vmalloc_map, LD_WAIT_SLEEP);
+ struct vm_struct *va;
+
+ lock_map_acquire_try(&vmalloc_map);
+ va = find_vm_area(addr);
if (va) {
pr_err("The buggy address belongs to the virtual mapping at\n"
" [%px, %px) created by:\n"
@@ -410,8 +421,13 @@ static void print_address_description(vo
page = vmalloc_to_page(addr);
}
+ lock_map_release(&vmalloc_map);
+ } else {
+ pr_err("The buggy address %px belongs to a vmalloc virtual mapping\n",
+ addr);
}
+print_page:
if (page) {
pr_err("The buggy address belongs to the physical page:\n");
dump_page(page, "kasan: bad access detected");
_
Patches currently in -mm which might be from longman(a)redhat.com are
kasan-dont-call-find_vm_area-in-rt-kernel.patch