The patch titled
Subject: kcov: mark in_softirq_really() as __always_inline
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kcov-mark-in_softirq_really-as-__always_inline.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: kcov: mark in_softirq_really() as __always_inline
Date: Tue, 17 Dec 2024 08:18:10 +0100
If gcc decides not to inline in_softirq_really(), objtool warns about a
function call with UACCESS enabled:
kernel/kcov.o: warning: objtool: __sanitizer_cov_trace_pc+0x1e: call to in_softirq_really() with UACCESS enabled
kernel/kcov.o: warning: objtool: check_kcov_mode+0x11: call to in_softirq_really() with UACCESS enabled
Mark this as __always_inline to avoid the problem.
Link: https://lkml.kernel.org/r/20241217071814.2261620-1-arnd@kernel.org
Fixes: 7d4df2dad312 ("kcov: properly check for softirq context")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Marco Elver <elver(a)google.com>
Cc: Aleksandr Nogikh <nogikh(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Josh Poimboeuf <jpoimboe(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/kcov.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/kcov.c~kcov-mark-in_softirq_really-as-__always_inline
+++ a/kernel/kcov.c
@@ -166,7 +166,7 @@ static void kcov_remote_area_put(struct
* Unlike in_serving_softirq(), this function returns false when called during
* a hardirq or an NMI that happened in the softirq context.
*/
-static inline bool in_softirq_really(void)
+static __always_inline bool in_softirq_really(void)
{
return in_serving_softirq() && !in_hardirq() && !in_nmi();
}
_
Patches currently in -mm which might be from arnd(a)arndb.de are
kcov-mark-in_softirq_really-as-__always_inline.patch
The patch titled
Subject: mm/kmemleak: fix sleeping function called from invalid context at print message
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-kmemleak-fix-sleeping-function-called-from-invalid-context-at-print-message.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Alessandro Carminati <acarmina(a)redhat.com>
Subject: mm/kmemleak: fix sleeping function called from invalid context at print message
Date: Tue, 17 Dec 2024 14:20:33 +0000
Address a bug in the kernel that triggers a "sleeping function called from
invalid context" warning when /sys/kernel/debug/kmemleak is printed under
specific conditions:
- CONFIG_PREEMPT_RT=y
- Set SELinux as the LSM for the system
- Set kptr_restrict to 1
- kmemleak buffer contains at least one item
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 136, name: cat
preempt_count: 1, expected: 0
RCU nest depth: 2, expected: 2
6 locks held by cat/136:
#0: ffff32e64bcbf950 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0xb8/0xe30
#1: ffffafe6aaa9dea0 (scan_mutex){+.+.}-{3:3}, at: kmemleak_seq_start+0x34/0x128
#3: ffff32e6546b1cd0 (&object->lock){....}-{2:2}, at: kmemleak_seq_show+0x3c/0x1e0
#4: ffffafe6aa8d8560 (rcu_read_lock){....}-{1:2}, at: has_ns_capability_noaudit+0x8/0x1b0
#5: ffffafe6aabbc0f8 (notif_lock){+.+.}-{2:2}, at: avc_compute_av+0xc4/0x3d0
irq event stamp: 136660
hardirqs last enabled at (136659): [<ffffafe6a80fd7a0>] _raw_spin_unlock_irqrestore+0xa8/0xd8
hardirqs last disabled at (136660): [<ffffafe6a80fd85c>] _raw_spin_lock_irqsave+0x8c/0xb0
softirqs last enabled at (0): [<ffffafe6a5d50b28>] copy_process+0x11d8/0x3df8
softirqs last disabled at (0): [<0000000000000000>] 0x0
Preemption disabled at:
[<ffffafe6a6598a4c>] kmemleak_seq_show+0x3c/0x1e0
CPU: 1 UID: 0 PID: 136 Comm: cat Tainted: G E 6.11.0-rt7+ #34
Tainted: [E]=UNSIGNED_MODULE
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0xa0/0x128
show_stack+0x1c/0x30
dump_stack_lvl+0xe8/0x198
dump_stack+0x18/0x20
rt_spin_lock+0x8c/0x1a8
avc_perm_nonode+0xa0/0x150
cred_has_capability.isra.0+0x118/0x218
selinux_capable+0x50/0x80
security_capable+0x7c/0xd0
has_ns_capability_noaudit+0x94/0x1b0
has_capability_noaudit+0x20/0x30
restricted_pointer+0x21c/0x4b0
pointer+0x298/0x760
vsnprintf+0x330/0xf70
seq_printf+0x178/0x218
print_unreferenced+0x1a4/0x2d0
kmemleak_seq_show+0xd0/0x1e0
seq_read_iter+0x354/0xe30
seq_read+0x250/0x378
full_proxy_read+0xd8/0x148
vfs_read+0x190/0x918
ksys_read+0xf0/0x1e0
__arm64_sys_read+0x70/0xa8
invoke_syscall.constprop.0+0xd4/0x1d8
el0_svc+0x50/0x158
el0t_64_sync+0x17c/0x180
%pS and %pK, in the same back trace line, are redundant, and %pS can void
%pK service in certain contexts.
%pS alone already provides the necessary information, and if it cannot
resolve the symbol, it falls back to printing the raw address voiding
the original intent behind the %pK.
Additionally, %pK requires a privilege check CAP_SYSLOG enforced through
the LSM, which can trigger a "sleeping function called from invalid
context" warning under RT_PREEMPT kernels when the check occurs in an
atomic context. This issue may also affect other LSMs.
This change avoids the unnecessary privilege check and resolves the
sleeping function warning without any loss of information.
Link: https://lkml.kernel.org/r/20241217142032.55793-1-acarmina@redhat.com
Fixes: 3a6f33d86baa ("mm/kmemleak: use %pK to display kernel pointers in backtrace")
Signed-off-by: Alessandro Carminati <acarmina(a)redhat.com>
Cc: Cl��ment L��ger <clement.leger(a)bootlin.com>
Cc: Alessandro Carminati <acarmina(a)redhat.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Eric Chanudet <echanude(a)redhat.com>
Cc: Gabriele Paoloni <gpaoloni(a)redhat.com>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Thomas Wei��schuh <thomas.weissschuh(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmemleak.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/kmemleak.c~mm-kmemleak-fix-sleeping-function-called-from-invalid-context-at-print-message
+++ a/mm/kmemleak.c
@@ -373,7 +373,7 @@ static void print_unreferenced(struct se
for (i = 0; i < nr_entries; i++) {
void *ptr = (void *)entries[i];
- warn_or_seq_printf(seq, " [<%pK>] %pS\n", ptr, ptr);
+ warn_or_seq_printf(seq, " %pS\n", ptr);
}
}
_
Patches currently in -mm which might be from acarmina(a)redhat.com are
mm-kmemleak-fix-sleeping-function-called-from-invalid-context-at-print-message.patch
From: Steven Rostedt <rostedt(a)goodmis.org>
The TP_printk() of a TRACE_EVENT() is a generic printf format that any
developer can create for their event. It may include pointers to strings
and such. A boot mapped buffer may contain data from a previous kernel
where the strings addresses are different.
One solution is to copy the event content and update the pointers by the
recorded delta, but a simpler solution (for now) is to just use the
print_fields() function to print these events. The print_fields() function
just iterates the fields and prints them according to what type they are,
and ignores the TP_printk() format from the event itself.
To understand the difference, when printing via TP_printk() the output
looks like this:
4582.696626: kmem_cache_alloc: call_site=getname_flags+0x47/0x1f0 ptr=00000000e70e10e0 bytes_req=4096 bytes_alloc=4096 gfp_flags=GFP_KERNEL node=-1 accounted=false
4582.696629: kmem_cache_alloc: call_site=alloc_empty_file+0x6b/0x110 ptr=0000000095808002 bytes_req=360 bytes_alloc=384 gfp_flags=GFP_KERNEL node=-1 accounted=false
4582.696630: kmem_cache_alloc: call_site=security_file_alloc+0x24/0x100 ptr=00000000576339c3 bytes_req=16 bytes_alloc=16 gfp_flags=GFP_KERNEL|__GFP_ZERO node=-1 accounted=false
4582.696653: kmem_cache_free: call_site=do_sys_openat2+0xa7/0xd0 ptr=00000000e70e10e0 name=names_cache
But when printing via print_fields() (echo 1 > /sys/kernel/tracing/options/fields)
the same event output looks like this:
4582.696626: kmem_cache_alloc: call_site=0xffffffff92d10d97 (-1831793257) ptr=0xffff9e0e8571e000 (-107689771147264) bytes_req=0x1000 (4096) bytes_alloc=0x1000 (4096) gfp_flags=0xcc0 (3264) node=0xffffffff (-1) accounted=(0)
4582.696629: kmem_cache_alloc: call_site=0xffffffff92d0250b (-1831852789) ptr=0xffff9e0e8577f800 (-107689770747904) bytes_req=0x168 (360) bytes_alloc=0x180 (384) gfp_flags=0xcc0 (3264) node=0xffffffff (-1) accounted=(0)
4582.696630: kmem_cache_alloc: call_site=0xffffffff92efca74 (-1829778828) ptr=0xffff9e0e8d35d3b0 (-107689640864848) bytes_req=0x10 (16) bytes_alloc=0x10 (16) gfp_flags=0xdc0 (3520) node=0xffffffff (-1) accounted=(0)
4582.696653: kmem_cache_free: call_site=0xffffffff92cfbea7 (-1831879001) ptr=0xffff9e0e8571e000 (-107689771147264) name=names_cache
The print_fields() needed one update to handle this, and that's to add the
delta to the pointer strings. It also needs to handle %pS, but that is out
of scope of this fix. Currently, it only prints the raw address.
Ftrace events like stack trace and function tracing have their own methods
to print and already can handle the deltas. Those event types are less
than __TRACE_LAST_TYPE. If the event type is greater than that, then the
print_fields() output is forced.
Cc: stable(a)vger.kernel.org
Fixes: 07714b4bb3f98 ("tracing: Handle old buffer mappings for event strings and functions")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 9 +++++++++
kernel/trace/trace_output.c | 3 ++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index be62f0ea1814..6581cb2bc67f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4353,6 +4353,15 @@ static enum print_line_t print_trace_fmt(struct trace_iterator *iter)
if (event) {
if (tr->trace_flags & TRACE_ITER_FIELDS)
return print_event_fields(iter, event);
+ /*
+ * For TRACE_EVENT() events, the print_fmt is not
+ * safe to use if the array has delta offsets
+ * Force printing via the fields.
+ */
+ if ((tr->text_delta || tr->data_delta) &&
+ event->type > __TRACE_LAST_TYPE)
+ return print_event_fields(iter, event);
+
return event->funcs->trace(iter, sym_flags, event);
}
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index da748b7cbc4d..0a5d12dd860f 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -853,6 +853,7 @@ static void print_fields(struct trace_iterator *iter, struct trace_event_call *c
struct list_head *head)
{
struct ftrace_event_field *field;
+ long delta = iter->tr->text_delta;
int offset;
int len;
int ret;
@@ -889,7 +890,7 @@ static void print_fields(struct trace_iterator *iter, struct trace_event_call *c
case FILTER_PTR_STRING:
if (!iter->fmt_size)
trace_iter_expand_format(iter);
- pos = *(void **)pos;
+ pos = (*(void **)pos) + delta;
ret = strncpy_from_kernel_nofault(iter->fmt, pos,
iter->fmt_size);
if (ret < 0)
--
2.45.2
From: Steven Rostedt <rostedt(a)goodmis.org>
When a ring buffer is mapped across boots, an delta is saved between the
addresses of the previous kernel and the current kernel. But this does not
handle module events nor dynamic events.
Simply do not create module or dynamic events to a boot mapped instance.
This will keep them from ever being enabled and therefore not part of the
previous kernel trace.
Cc: stable(a)vger.kernel.org
Fixes: e645535a954ad ("tracing: Add option to use memmapped memory for trace boot instance")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace_events.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 77e68efbd43e..d6359318d5c1 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2984,6 +2984,12 @@ trace_create_new_event(struct trace_event_call *call,
if (!event_in_systems(call, tr->system_names))
return NULL;
+ /* Boot mapped instances cannot use modules or dynamic events */
+ if (tr->flags & TRACE_ARRAY_FL_BOOT) {
+ if ((call->flags & TRACE_EVENT_FL_DYNAMIC) || call->module)
+ return NULL;
+ }
+
file = kmem_cache_alloc(file_cachep, GFP_TRACE);
if (!file)
return ERR_PTR(-ENOMEM);
--
2.45.2
As per memory map table, the region for PCIe6a is 64MByte. Hence, set the
size of 32 bit non-prefetchable memory region beginning on address
0x70300000 as 0x3d00000 so that BAR space assigned to BAR registers can be
allocated from 0x70300000 to 0x74000000.
Fixes: 7af141850012 ("arm64: dts: qcom: x1e80100: Fix up BAR spaces")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiang Yu <quic_qianyu(a)quicinc.com>
---
arch/arm64/boot/dts/qcom/x1e80100.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100.dtsi b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
index f044921457d0..90ddac606719 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100.dtsi
+++ b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
@@ -3126,7 +3126,7 @@ pcie6a: pci@1bf8000 {
#address-cells = <3>;
#size-cells = <2>;
ranges = <0x01000000 0x0 0x00000000 0x0 0x70200000 0x0 0x100000>,
- <0x02000000 0x0 0x70300000 0x0 0x70300000 0x0 0x1d00000>;
+ <0x02000000 0x0 0x70300000 0x0 0x70300000 0x0 0x3d00000>;
bus-range = <0x00 0xff>;
dma-coherent;
--
2.34.1
File-scope "__pmic_glink_lock" mutex protects the filke-scope
"__pmic_glink", thus reference to it should be obtained under the lock,
just like pmic_glink_rpmsg_remove() is doing. Otherwise we have a race
during if PMIC GLINK device removal: the pmic_glink_rpmsg_probe()
function could store local reference before mutex in driver removal is
acquired.
Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
Changes in v2:
1. None
---
drivers/soc/qcom/pmic_glink.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c
index 9606222993fd..452f30a9354d 100644
--- a/drivers/soc/qcom/pmic_glink.c
+++ b/drivers/soc/qcom/pmic_glink.c
@@ -217,10 +217,11 @@ static void pmic_glink_pdr_callback(int state, char *svc_path, void *priv)
static int pmic_glink_rpmsg_probe(struct rpmsg_device *rpdev)
{
- struct pmic_glink *pg = __pmic_glink;
+ struct pmic_glink *pg;
int ret = 0;
mutex_lock(&__pmic_glink_lock);
+ pg = __pmic_glink;
if (!pg) {
ret = dev_err_probe(&rpdev->dev, -ENODEV, "no pmic_glink device to attach to\n");
goto out_unlock;
--
2.43.0
On Wed, Oct 16, 2024 at 03:10:34PM +0200, Christian Brauner wrote:
> On Fri, 26 Apr 2024 16:05:48 +0800, Xuewen Yan wrote:
> > Now, the epoll only use wake_up() interface to wake up task.
> > However, sometimes, there are epoll users which want to use
> > the synchronous wakeup flag to hint the scheduler, such as
> > Android binder driver.
> > So add a wake_up_sync() define, and use the wake_up_sync()
> > when the sync is true in ep_poll_callback().
> >
> > [...]
>
> Applied to the vfs.misc branch of the vfs/vfs.git tree.
> Patches in the vfs.misc branch should appear in linux-next soon.
>
> Please report any outstanding bugs that were missed during review in a
> new review to the original patch series allowing us to drop it.
>
> It's encouraged to provide Acked-bys and Reviewed-bys even though the
> patch has now been applied. If possible patch trailers will be updated.
>
> Note that commit hashes shown below are subject to change due to rebase,
> trailer updates or similar. If in doubt, please check the listed branch.
>
> tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> branch: vfs.misc
This is a bug that's been present for all of time, so I think we should:
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
I sent a patch which adds a benchmark for nonblocking pipes using epoll:
https://lore.kernel.org/lkml/20241016190009.866615-1-bgeffon@google.com/
Using this new benchmark I get the following results without this fix
and with this fix:
$ tools/perf/perf bench sched pipe -n
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 12.194 [sec]
12.194376 usecs/op
82005 ops/sec
$ tools/perf/perf bench sched pipe -n
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 9.229 [sec]
9.229738 usecs/op
108345 ops/sec
>
> [1/1] epoll: Add synchronous wakeup support for ep_poll_callback
> https://git.kernel.org/vfs/vfs/c/2ce0e17660a7
From: Steven Rostedt <rostedt(a)goodmis.org>
A bug was discovered where the idle shadow stacks were not initialized
for offline CPUs when starting function graph tracer, and when they came
online they were not traced due to the missing shadow stack. To fix
this, the idle task shadow stack initialization was moved to using the
CPU hotplug callbacks. But it removed the initialization when the
function graph was enabled. The problem here is that the hotplug
callbacks are called when the CPUs come online, but the idle shadow
stack initialization only happens if function graph is currently
active. This caused the online CPUs to not get their shadow stack
initialized.
The idle shadow stack initialization still needs to be done when the
function graph is registered, as they will not be allocated if function
graph is not registered.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Link: https://lore.kernel.org/20241211135335.094ba282@batman.local.home
Fixes: 2c02f7375e65 ("fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks")
Reported-by: Linus Walleij <linus.walleij(a)linaro.org>
Tested-by: Linus Walleij <linus.walleij(a)linaro.org>
Closes: https://lore.kernel.org/all/CACRpkdaTBrHwRbbrphVy-=SeDz6MSsXhTKypOtLrTQ+DgG…
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/fgraph.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 0bf78517b5d4..ddedcb50917f 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -1215,7 +1215,7 @@ void fgraph_update_pid_func(void)
static int start_graph_tracing(void)
{
unsigned long **ret_stack_list;
- int ret;
+ int ret, cpu;
ret_stack_list = kcalloc(FTRACE_RETSTACK_ALLOC_SIZE,
sizeof(*ret_stack_list), GFP_KERNEL);
@@ -1223,6 +1223,12 @@ static int start_graph_tracing(void)
if (!ret_stack_list)
return -ENOMEM;
+ /* The cpu_boot init_task->ret_stack will never be freed */
+ for_each_online_cpu(cpu) {
+ if (!idle_task(cpu)->ret_stack)
+ ftrace_graph_init_idle_task(idle_task(cpu), cpu);
+ }
+
do {
ret = alloc_retstack_tasklist(ret_stack_list);
} while (ret == -EAGAIN);
--
2.45.2