Re: [PATCH bpf-next 2/2] bpf/stackmap: fix A-A deadlock in bpf_get_stack()

10 Oct 2019

On 10/10/19 10:46 AM, Peter Zijlstra wrote:
...
On Thu, Oct 10, 2019 at 05:19:01PM +0000, Alexei Starovoitov wrote:
...
On 10/10/19 12:36 AM, Peter Zijlstra wrote:
...
On Wed, Oct 09, 2019 at 11:19:16PM -0700, Song Liu wrote:
...
bpf stackmap with build-id lookup (BPF_F_STACK_BUILD_ID) can trigger A-A
deadlock on rq_lock():
rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[...]
Call Trace:
   try_to_wake_up+0x1ad/0x590
   wake_up_q+0x54/0x80
   rwsem_wake+0x8a/0xb0
   bpf_get_stack+0x13c/0x150
   bpf_prog_fbdaf42eded9fe46_on_event+0x5e3/0x1000
   bpf_overflow_handler+0x60/0x100
   __perf_event_overflow+0x4f/0xf0
   perf_swevent_overflow+0x99/0xc0
   ___perf_sw_event+0xe7/0x120
   __schedule+0x47d/0x620
   schedule+0x29/0x90
   futex_wait_queue_me+0xb9/0x110
   futex_wait+0x139/0x230
   do_futex+0x2ac/0xa50
   __x64_sys_futex+0x13c/0x180
   do_syscall_64+0x42/0x100
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
...

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 052580c33d26..3b278f6b0c3e 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -287,7 +287,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
   	bool irq_work_busy = false;
   	struct stack_map_irq_work *work = NULL;

if (in_nmi()) {


if (in_nmi() || this_rq_is_locked()) {
	work = this_cpu_ptr(&up_read_work);
	if (work->irq_work.flags & IRQ_WORK_BUSY)
		/* cannot queue more up_read, fallback */

This is horrific crap. Just say no to that get_build_id_offset()
trainwreck.
this is not a helpful comment.
What issues do you see with this approach?
It will still generate deadlocks if I place a tracepoint inside a lock
that nests inside rq->lock, and it won't ever be able to detect that.
Say do the very same thing on trace_hrtimer_start(), which is under
cpu_base->lock, which nests inside rq->lock. That should give you an
AB-BA.
tracepoints / perf-overflow should _never_ take locks.
All of stack_map_get_build_id_offset() is just disguisting games; I did
tell you guys how to do lockless vma lookups a few years ago -- and yes,
that is invasive core mm surgery. But this is just disguisting hacks for
not wanting to do it right.
you mean speculative page fault stuff?
That was my hope as well and I offered Laurent all the help to land it.
Yet after a year since we've talked the patches are not any closer
to landing.
Any other 'invasive mm surgery' you have in mind?
...
Basically the only semi-sane thing to do with that trainwreck is
s/in_nmi()/true/ and pray.
On top of that I just hate buildids in general.
Emotions aside... build_id is useful and used in production.
It's used widely because it solves real problems.
This dead lock is from real servers and not from some sanitizer wannabe.
Hence we need to fix it as cleanly as possible and quickly.
s/in_nmi/true/ is certainly an option.
I'm worried about overhead of doing irq_work_queue() all the time.
But I'm not familiar with mechanism enough to justify the concerns.
Would it make sense to do s/in_nmi/irgs_disabled/ instead?

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH bpf-next 2/2] bpf/stackmap: fix A-A deadlock in bpf_get_stack()