Note: this applies to 5.10 stable only. It doesn't trigger on anything
above 5.10 as the code there has been substantially reworked. This also
doesn't apply to any stable kernel below 5.10 afaict.
Syzbot found a bug: KASAN: invalid-free in io_dismantle_req
https://syzkaller.appspot.com/bug?id=123d9a852fc88ba573ffcb2dbcf4f9576c3b05…
The test submits bunch of io_uring writes and exits, which then triggers
uring_task_cancel() and io_put_identity(), which in some corner cases,
tries to free a static identity. This causes a panic as shown in the
trace below:
BUG: KASAN: double-free or invalid-free in kfree+0xd5/0x310
CPU: 0 PID: 4618 Comm: repro Not tainted 5.10.76-05281-g4944ec82ebb9-dirty #17
Call Trace:
dump_stack_lvl+0x1b2/0x21b
print_address_description+0x8d/0x3b0
kasan_report_invalid_free+0x58/0x130
____kasan_slab_free+0x14b/0x170
__kasan_slab_free+0x11/0x20
slab_free_freelist_hook+0xcc/0x1a0
kfree+0xd5/0x310
io_dismantle_req+0x9b0/0xd90
io_do_iopoll+0x13a4/0x23e0
io_iopoll_try_reap_events+0x116/0x290
io_uring_cancel_task_requests+0x197d/0x1ee0
io_uring_flush+0x170/0x6d0
filp_close+0xb0/0x150
put_files_struct+0x1d4/0x350
exit_files+0x80/0xa0
do_exit+0x6d9/0x2390
do_group_exit+0x16a/0x2d0
get_signal+0x133e/0x1f80
arch_do_signal+0x7b/0x610
exit_to_user_mode_prepare+0xaa/0xe0
syscall_exit_to_user_mode+0x24/0x40
do_syscall_64+0x3d/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Allocated by task 4611:
____kasan_kmalloc+0xcd/0x100
__kasan_kmalloc+0x9/0x10
kmem_cache_alloc_trace+0x208/0x390
io_uring_alloc_task_context+0x57/0x550
io_uring_add_task_file+0x1f7/0x290
io_uring_create+0x2195/0x3490
__x64_sys_io_uring_setup+0x1bf/0x280
do_syscall_64+0x31/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff88810732b500
which belongs to the cache kmalloc-192 of size 192
The buggy address is located 88 bytes inside of
192-byte region [ffff88810732b500, ffff88810732b5c0)
Kernel panic - not syncing: panic_on_warn set ...
This issue bisected to this commit:
commit 186725a80c4e ("io_uring: fix skipping disabling sqo on exec")
Simple reverting the offending commit doesn't work as it hits some
other, related issues like:
/* sqo_dead check is for when this happens after cancellation */
WARN_ON_ONCE(ctx->sqo_task == current && !ctx->sqo_dead &&
!xa_load(&tctx->xa, (unsigned long)file));
------------[ cut here ]------------
WARNING: CPU: 1 PID: 5622 at fs/io_uring.c:8960 io_uring_flush+0x5bc/0x6d0
Modules linked in:
CPU: 1 PID: 5622 Comm: repro Not tainted 5.10.76-05281-g4944ec82ebb9-dirty #16
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-6.fc35 04/01/2014
RIP: 0010:io_uring_flush+0x5bc/0x6d0
Call Trace:
filp_close+0xb0/0x150
put_files_struct+0x1d4/0x350
reset_files_struct+0x88/0xa0
bprm_execve+0x7f2/0x9f0
do_execveat_common+0x46f/0x5d0
__x64_sys_execve+0x92/0xb0
do_syscall_64+0x31/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Changing __io_uring_task_cancel() to call io_disable_sqo_submit() directly,
as the comment suggests, only if __io_uring_files_cancel() is not executed
seems to fix the issue.
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: <io-uring(a)vger.kernel.org>
Cc: <linux-fsdevel(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Cc: <stable(a)vger.kernel.org>
Reported-by: syzbot+6055980d041c8ac23307(a)syzkaller.appspotmail.com
Signed-off-by: Tadeusz Struk <tadeusz.struk(a)linaro.org>
---
fs/io_uring.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0736487165da..fcf9ffe9b209 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8882,20 +8882,18 @@ void __io_uring_task_cancel(void)
struct io_uring_task *tctx = current->io_uring;
DEFINE_WAIT(wait);
s64 inflight;
+ int canceled = 0;
/* make sure overflow events are dropped */
atomic_inc(&tctx->in_idle);
- /* trigger io_disable_sqo_submit() */
- if (tctx->sqpoll)
- __io_uring_files_cancel(NULL);
-
do {
/* read completions before cancelations */
inflight = tctx_inflight(tctx);
if (!inflight)
break;
__io_uring_files_cancel(NULL);
+ canceled = 1;
prepare_to_wait(&tctx->wait, &wait, TASK_UNINTERRUPTIBLE);
@@ -8909,6 +8907,21 @@ void __io_uring_task_cancel(void)
finish_wait(&tctx->wait, &wait);
} while (1);
+ /*
+ * trigger io_disable_sqo_submit()
+ * if not already done by __io_uring_files_cancel()
+ */
+ if (tctx->sqpoll && !canceled) {
+ struct file *file;
+ unsigned long index;
+
+ xa_for_each(&tctx->xa, index, file) {
+ struct io_ring_ctx *ctx = file->private_data;
+
+ io_disable_sqo_submit(ctx);
+ }
+ }
+
atomic_dec(&tctx->in_idle);
io_uring_remove_task_files(tctx);
--
2.33.1
Commit 20b0dfa86bef0e80b41b0e5ac38b92f23b6f27f9 upstream.
The original commit depended on a rework commit (724fc856c09e ("drm/vc4:
hdmi: Split the CEC disable / enable functions in two")) that
(rightfully) didn't reach stable.
However, probably because the context changed, when the patch was
applied to stable the pm_runtime_put called got moved to the end of the
vc4_hdmi_cec_adap_enable function (that would have become
vc4_hdmi_cec_disable with the rework) to vc4_hdmi_cec_init.
This means that at probe time, we now drop our reference to the clocks
and power domains and thus end up with a CPU hang when the CPU tries to
access registers.
The call to pm_runtime_resume_and_get() is also problematic since the
.adap_enable CEC hook is called both to enable and to disable the
controller. That means that we'll now call pm_runtime_resume_and_get()
at disable time as well, messing with the reference counting.
The behaviour we should have though would be to have
pm_runtime_resume_and_get() called when the CEC controller is enabled,
and pm_runtime_put when it's disabled.
We need to move things around a bit to behave that way, but it aligns
stable with upstream.
Cc: <stable(a)vger.kernel.org> # 5.10.x
Cc: <stable(a)vger.kernel.org> # 5.15.x
Cc: <stable(a)vger.kernel.org> # 5.16.x
Reported-by: Michael Stapelberg <michael+drm(a)stapelberg.ch>
Signed-off-by: Maxime Ripard <maxime(a)cerno.tech>
---
drivers/gpu/drm/vc4/vc4_hdmi.c | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index 8465914892fa..e6aad838065b 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -1739,18 +1739,18 @@ static int vc4_hdmi_cec_adap_enable(struct cec_adapter *adap, bool enable)
u32 val;
int ret;
- ret = pm_runtime_resume_and_get(&vc4_hdmi->pdev->dev);
- if (ret)
- return ret;
-
- val = HDMI_READ(HDMI_CEC_CNTRL_5);
- val &= ~(VC4_HDMI_CEC_TX_SW_RESET | VC4_HDMI_CEC_RX_SW_RESET |
- VC4_HDMI_CEC_CNT_TO_4700_US_MASK |
- VC4_HDMI_CEC_CNT_TO_4500_US_MASK);
- val |= ((4700 / usecs) << VC4_HDMI_CEC_CNT_TO_4700_US_SHIFT) |
- ((4500 / usecs) << VC4_HDMI_CEC_CNT_TO_4500_US_SHIFT);
-
if (enable) {
+ ret = pm_runtime_resume_and_get(&vc4_hdmi->pdev->dev);
+ if (ret)
+ return ret;
+
+ val = HDMI_READ(HDMI_CEC_CNTRL_5);
+ val &= ~(VC4_HDMI_CEC_TX_SW_RESET | VC4_HDMI_CEC_RX_SW_RESET |
+ VC4_HDMI_CEC_CNT_TO_4700_US_MASK |
+ VC4_HDMI_CEC_CNT_TO_4500_US_MASK);
+ val |= ((4700 / usecs) << VC4_HDMI_CEC_CNT_TO_4700_US_SHIFT) |
+ ((4500 / usecs) << VC4_HDMI_CEC_CNT_TO_4500_US_SHIFT);
+
HDMI_WRITE(HDMI_CEC_CNTRL_5, val |
VC4_HDMI_CEC_TX_SW_RESET | VC4_HDMI_CEC_RX_SW_RESET);
HDMI_WRITE(HDMI_CEC_CNTRL_5, val);
@@ -1778,7 +1778,10 @@ static int vc4_hdmi_cec_adap_enable(struct cec_adapter *adap, bool enable)
HDMI_WRITE(HDMI_CEC_CPU_MASK_SET, VC4_HDMI_CPU_CEC);
HDMI_WRITE(HDMI_CEC_CNTRL_5, val |
VC4_HDMI_CEC_TX_SW_RESET | VC4_HDMI_CEC_RX_SW_RESET);
+
+ pm_runtime_put(&vc4_hdmi->pdev->dev);
}
+
return 0;
}
@@ -1889,8 +1892,6 @@ static int vc4_hdmi_cec_init(struct vc4_hdmi *vc4_hdmi)
if (ret < 0)
goto err_remove_handlers;
- pm_runtime_put(&vc4_hdmi->pdev->dev);
-
return 0;
err_remove_handlers:
--
2.34.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e464121f2d40eabc7d11823fb26db807ce945df4 Mon Sep 17 00:00:00 2001
From: Tony Luck <tony.luck(a)intel.com>
Date: Fri, 21 Jan 2022 09:47:38 -0800
Subject: [PATCH] x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN
Missed adding the Icelake-D CPU to the list. It uses the same MSRs
to control and read the inventory number as all the other models.
Fixes: dc6b025de95b ("x86/mce: Add Xeon Icelake to list of CPUs that support PPIN")
Reported-by: Ailin Xu <ailin.xu(a)intel.com>
Signed-off-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20220121174743.1875294-2-tony.luck@intel.com
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index bb9a46a804bf..baafbb37be67 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -486,6 +486,7 @@ static void intel_ppin_init(struct cpuinfo_x86 *c)
case INTEL_FAM6_BROADWELL_X:
case INTEL_FAM6_SKYLAKE_X:
case INTEL_FAM6_ICELAKE_X:
+ case INTEL_FAM6_ICELAKE_D:
case INTEL_FAM6_SAPPHIRERAPIDS_X:
case INTEL_FAM6_XEON_PHI_KNL:
case INTEL_FAM6_XEON_PHI_KNM:
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c5de60cd622a2607c043ba65e25a6e9998a369f9 Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung(a)kernel.org>
Date: Mon, 24 Jan 2022 11:58:08 -0800
Subject: [PATCH] perf/core: Fix cgroup event list management
The active cgroup events are managed in the per-cpu cgrp_cpuctx_list.
This list is only accessed from current cpu and not protected by any
locks. But from the commit ef54c1a476ae ("perf: Rework
perf_event_exit_event()"), it's possible to access (actually modify)
the list from another cpu.
In the perf_remove_from_context(), it can remove an event from the
context without an IPI when the context is not active. This is not
safe with cgroup events which can have some active events in the
context even if ctx->is_active is 0 at the moment. The target cpu
might be in the middle of list iteration at the same time.
If the event is enabled when it's about to be closed, it might call
perf_cgroup_event_disable() and list_del() with the cgrp_cpuctx_list
on a different cpu.
This resulted in a crash due to an invalid list pointer access during
the cgroup list traversal on the cpu which the event belongs to.
Let's fallback to IPI to access the cgrp_cpuctx_list from that cpu.
Similarly, perf_install_in_context() should use IPI for the cgroup
events too.
Fixes: ef54c1a476ae ("perf: Rework perf_event_exit_event()")
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lkml.kernel.org/r/20220124195808.2252071-1-namhyung@kernel.org
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b1c1928c0e7c..76c754e45d01 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2462,7 +2462,11 @@ static void perf_remove_from_context(struct perf_event *event, unsigned long fla
* event_function_call() user.
*/
raw_spin_lock_irq(&ctx->lock);
- if (!ctx->is_active) {
+ /*
+ * Cgroup events are per-cpu events, and must IPI because of
+ * cgrp_cpuctx_list.
+ */
+ if (!ctx->is_active && !is_cgroup_event(event)) {
__perf_remove_from_context(event, __get_cpu_context(ctx),
ctx, (void *)flags);
raw_spin_unlock_irq(&ctx->lock);
@@ -2895,11 +2899,14 @@ perf_install_in_context(struct perf_event_context *ctx,
* perf_event_attr::disabled events will not run and can be initialized
* without IPI. Except when this is the first event for the context, in
* that case we need the magic of the IPI to set ctx->is_active.
+ * Similarly, cgroup events for the context also needs the IPI to
+ * manipulate the cgrp_cpuctx_list.
*
* The IOC_ENABLE that is sure to follow the creation of a disabled
* event will issue the IPI and reprogram the hardware.
*/
- if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF && ctx->nr_events) {
+ if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF &&
+ ctx->nr_events && !is_cgroup_event(event)) {
raw_spin_lock_irq(&ctx->lock);
if (ctx->task == TASK_TOMBSTONE) {
raw_spin_unlock_irq(&ctx->lock);