On Thu, Oct 10, 2019 at 06:06:14PM +0000, Alexei Starovoitov wrote:
On 10/10/19 10:46 AM, Peter Zijlstra wrote:
All of stack_map_get_build_id_offset() is just disguisting games; I did tell you guys how to do lockless vma lookups a few years ago -- and yes, that is invasive core mm surgery. But this is just disguisting hacks for not wanting to do it right.
you mean speculative page fault stuff? That was my hope as well and I offered Laurent all the help to land it. Yet after a year since we've talked the patches are not any closer to landing. Any other 'invasive mm surgery' you have in mind?
Indeed that series. It had RCU managed VMAs and lockless VMA lookups, which is exactly what you need here.
Basically the only semi-sane thing to do with that trainwreck is s/in_nmi()/true/ and pray.
On top of that I just hate buildids in general.
Emotions aside... build_id is useful and used in production. It's used widely because it solves real problems.
AFAIU it solves the problem of you not knowing what version of the binary runs where; which I was hoping your cloud infrastructure thing would actually know already.
Anyway, I know what it does, I just don't nessecarily agree it is the right way around that particular problem (also, the way I'm personally affected is that perf-record is dead slow by default due to built-id post processing).
And it obviously leads to horrible hacks like the code currently under discussion :/
This dead lock is from real servers and not from some sanitizer wannabe.
If you enable CFS bandwidth control and run this function on the trace_hrtimer_start() tracepoint, you should be able to trigger a real AB-BA lockup.
Hence we need to fix it as cleanly as possible and quickly. s/in_nmi/true/ is certainly an option.
That is the best option; because tracepoints / perf-overflow handlers really should not be taking any locks.
I'm worried about overhead of doing irq_work_queue() all the time. But I'm not familiar with mechanism enough to justify the concerns. Would it make sense to do s/in_nmi/irgs_disabled/ instead?
irqs_disabled() should work in this particular case because rq->lock (and therefore all it's nested locks) are IRQ-safe.