On Wed, 22 Jan 2025 11:14:57 +0100 Peter Zijlstra peterz@infradead.org wrote:
On Tue, Jan 21, 2025 at 03:19:44PM -0500, Steven Rostedt wrote:
From: Steven Rostedt rostedt@goodmis.org
raw_spin_locks can be traced by lockdep or tracing itself. Atomic64 operations can be used in the tracing infrastructure. When an architecture does not have true atomic64 operations it can use the generic version that disables interrupts and uses spin_locks.
The tracing ring buffer code uses atomic64 operations for the time keeping. But because some architectures use the default operations, the locking inside the atomic operations can cause an infinite recursion.
As atomic64 is an architecture specific operation, it should not
used in generic code :-)
Yes, but the atomic64 implementation is architecture specific. I could change that to be:
"As atomic64 implementation is architecture specific, it should not"
be using raw_spin_locks() but instead arch_spin_locks as that is the purpose of arch_spin_locks. To be used in architecture specific implementations of generic infrastructure like atomic64 operations.
Urgh.. this is horrible. This is why you shouldn't be using atomic64 in generic code much :/
Why not just drop support for those cummy archs? Or drop whatever trace feature depends on this.
Can't that would be a regression. Here's the history. As the timestamps of events are related to each other, as one event only has the delta from the previous event (yeah, this causes issues, but it was recommended to do it this way when it was created, and it can't change now). And as the ring buffer is lockless, it can be preempted by interrupts and NMIs that can inject their own timestamps, it use to be that an interrupted event would just have a zero delta. If an interrupt came in while an event was being written, and it created events, all its events would have the same timestamp as the event it interrupted.
But this caused issues due to not being able to see timings of events from interrupts that interrupted an event in progress.
I fixed this, but that required doing a 64 bit cmpxchg on the timestamp when the race occurred. I originally did not use atomic64, and instead for 32bit architectures, it used a "special" timestamp that was broken into multiple 32bit words, and there was special logic to try to keep them in sync when this occurred. But that started becoming too complex with some corner cases, so I decided to simply let these 32 bit architectures us atomic64. That worked fine for architectures that have 64 bit atomics and do not rely on spinlocks.
Then I started getting reports of the tracing system causing deadlocks. That is, because raw_spin_lock() is traced. And it should be, as locks do cause issues and tracing them can help debug those issues. Lockdep and tracing both use arch_spin_lock() so that it doesn't recurse into itself. Even RCU uses it. So I don't see why there would be any issue with the atomic64 implementation using it as it is an even more basic operation than RCU is.
s64 generic_atomic64_read(const atomic64_t *v) { unsigned long flags;
- raw_spinlock_t *lock = lock_addr(v);
- arch_spinlock_t *lock = lock_addr(v); s64 val;
- raw_spin_lock_irqsave(lock, flags);
- local_irq_save(flags);
- arch_spin_lock(lock);
Note that this is not an equivalent change. It's probably sufficient, but at the very least the Changelog should call out what went missing and how that is okay.
What exactly is the difference here that you are talking about? I know that raw_spin_lock_irqsave() has lots of different variants depending on the config options, but I'm not sure which you are talking about? Is it the fact that you can't do the different variants with this?
Or is it because it's not checked by lockdep? Hmm, I mentioned that in the cover letter, but I failed to mention it here in this change log. I can definitely add that, if that's what you are referring to.
-- Steve