On Thu, Mar 25, 2021 at 11:17AM +0100, Marco Elver wrote:
On Wed, Mar 24, 2021 at 12:24PM +0100, Marco Elver wrote:
From: Peter Zijlstra peterz@infradead.org
Make perf_event_exit_event() more robust, such that we can use it from other contexts. Specifically the up and coming remove_on_exec.
For this to work we need to address a few issues. Remove_on_exec will not destroy the entire context, so we cannot rely on TASK_TOMBSTONE to disable event_function_call() and we thus have to use perf_remove_from_context().
When using perf_remove_from_context(), there's two races to consider. The first is against close(), where we can have concurrent tear-down of the event. The second is against child_list iteration, which should not find a half baked event.
To address this, teach perf_remove_from_context() to special case !ctx->is_active and about DETACH_CHILD.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Marco Elver elver@google.com
v3:
- New dependency for series: https://lkml.kernel.org/r/YFn/I3aKF+TOjGcl@hirez.programming.kicks-ass.net
syzkaller found a crash with stack trace pointing at changes in this patch. Can't tell if this is an old issue or introduced in this series.
Yay, I found a reproducer. v5.12-rc4 is good, and sadly with this patch only we crash. :-/
Here's a stacktrace with just this patch applied:
| BUG: kernel NULL pointer dereference, address: 00000000000007af | #PF: supervisor read access in kernel mode | #PF: error_code(0x0000) - not-present page | PGD 0 P4D 0 | Oops: 0000 [#1] PREEMPT SMP PTI | CPU: 7 PID: 465 Comm: a.out Not tainted 5.12.0-rc4+ #25 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 | RIP: 0010:task_pid_ptr kernel/pid.c:324 [inline] | RIP: 0010:__task_pid_nr_ns+0x112/0x240 kernel/pid.c:500 | Code: e8 13 55 07 00 e8 1e a6 0e 00 48 c7 c6 83 1e 0b 81 48 c7 c7 a0 2e d5 82 e8 4b 08 04 00 44 89 e0 5b 5d 41 5c c3 e8 fe a5 0e 00 <48> 8b 85 b0 07 00 00 4a 8d ac e0 98 01 00 00 e9 5a ff ff ff e8 e5 | RSP: 0000:ffffc90001b73a60 EFLAGS: 00010093 | RAX: 0000000000000000 RBX: ffffffff82c69820 RCX: ffffffff810b1eb2 | RDX: ffff888108d143c0 RSI: 0000000000000000 RDI: ffffffff8299ccc6 | RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000000 | R10: ffff888108d14db8 R11: 0000000000000000 R12: 0000000000000001 | R13: ffffffffffffffff R14: ffffffffffffffff R15: ffff888108e05240 | FS: 0000000000000000(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000 | CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 | CR2: 00000000000007af CR3: 0000000002c22002 CR4: 0000000000770ee0 | DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 | DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 | PKRU: 55555554 | Call Trace: | perf_event_pid_type kernel/events/core.c:1412 [inline] | perf_event_pid kernel/events/core.c:1421 [inline] | perf_event_read_event+0x78/0x1d0 kernel/events/core.c:7406 | sync_child_event kernel/events/core.c:12404 [inline] | perf_child_detach kernel/events/core.c:2223 [inline] | __perf_remove_from_context+0x14d/0x280 kernel/events/core.c:2359 | perf_remove_from_context+0x9f/0xf0 kernel/events/core.c:2395 | perf_event_exit_event kernel/events/core.c:12442 [inline] | perf_event_exit_task_context kernel/events/core.c:12523 [inline] | perf_event_exit_task+0x276/0x4c0 kernel/events/core.c:12556 | do_exit+0x4cd/0xed0 kernel/exit.c:834 | do_group_exit+0x4d/0xf0 kernel/exit.c:922 | get_signal+0x1d2/0xf30 kernel/signal.c:2777 | arch_do_signal_or_restart+0xf7/0x750 arch/x86/kernel/signal.c:789 | handle_signal_work kernel/entry/common.c:147 [inline] | exit_to_user_mode_loop kernel/entry/common.c:171 [inline] | exit_to_user_mode_prepare+0x113/0x190 kernel/entry/common.c:208 | irqentry_exit_to_user_mode+0x6/0x30 kernel/entry/common.c:314 | asm_exc_general_protection+0x1e/0x30 arch/x86/include/asm/idtentry.h:571
Attached is a C reproducer of the syzkaller program that crashes us.
Thanks, -- Marco