Please don't top-post :( FDFY:
On 11/10/25 09:19, Philipp Stanner wrote:
The spsc_queue is an unlocked, highly asynchronous piece of infrastructure. Its inline function spsc_queue_peek() obtains the head entry of the queue.
This access is performed without READ_ONCE() and is, therefore, undefined behavior. In order to prevent the compiler from ever reordering that access, or even optimizing it away, a READ_ONCE() is strictly necessary. This is easily proven by the fact that spsc_queue_pop() uses this very pattern to access the head.
Add READ_ONCE() to spsc_queue_peek().
Cc: stable@vger.kernel.org # v4.16+ Fixes: 27105db6c63a ("drm/amdgpu: Add SPSC queue to scheduler.") Signed-off-by: Philipp Stanner phasta@kernel.org
I think this makes it less broken, but I'm not even sure if it's enough or more memory barriers or an rcu_dereference() would be correct. The spsc_queue is, of course, not documented and the existing barrier comments are either false or not telling.
If someone has an idea, shoot us the info. Otherwise I think this is the right thing to do for now.
P.
include/drm/spsc_queue.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/drm/spsc_queue.h b/include/drm/spsc_queue.h index ee9df8cc67b7..39bada748ffc 100644 --- a/include/drm/spsc_queue.h +++ b/include/drm/spsc_queue.h @@ -54,7 +54,7 @@ static inline void spsc_queue_init(struct spsc_queue *queue) static inline struct spsc_node *spsc_queue_peek(struct spsc_queue *queue) {
- return queue->head;
- return READ_ONCE(queue->head);
}
On Mon, 2025-11-10 at 12:24 +0100, Christian König wrote:
As far as I can see that is not correct or rather not complete.
It cannot be incorrect by definition, because it simply ensures that the load will actually take place there.
Incomplete it can be.
The peek function should only be used to opportunistically look at the top of the queue. It would only be problematic if it returns a non NULL value once and then a NULL value later.
The whole idea of the SPSC is that it is barrier-free and the signaling of new entries to the consumer side is providing the barrier.
So basically on the provider side you have spsc_push(entry) wake_up(consumer)
And on the consumer side you have:
woken_up_by_provider() { entry = spsc_peek(); ... spsc_pop(); }
Well no, the scheduler can pick up the next job whenever it feels like it. Restarting it for example will have it peek into your queue, regardless of wake events.
In any case this is a highly fragile design. See below, too.
The problem we are facing here is that the spsc only provides the guarantee that you see the entry pointer, but not the content of entry itself.
So use cases like:
woken_up_by_provider() { while (entry = spsc_peek()) { ... spsc_pop(); } }
Are illegal since you don't have the correct memory barriers any more.
I can't follow. Are you saying that spsc_queue_peek() is correct because you know for sure that when it's used no one else might be changing that pointer?
Even if that were true this design is highly fragile.
Took me an eternity to understand that as well, so bear with me that I didn't previously explained that.
s/explain/document :)
As discussed few weeks ago with Sima and Tvrtko, what we really need to move to in all of DRM is this development pattern:
1. For parallel code, at first by default use a boring, horribly slow (sarcasm) spinlock. BTW I'm not even convinced that a spinlock is slower than lockless tricks. Paul's book states that a CAS atomic instruction takes about 60 cycles, and taking a lock takes 100. 2. If you want to do parallelism without locks, you need to justify it really well. "rmb() so list_empty() works without a lock" doesn't qualify, but real use case speedups. 3. If you write lockless infrastructure, you need to document it really well. In particular you need to state: 1. How it works 2. What the rules are
See llist.h as an example. It clearly states when you need a lock and when you don't. Or RCU. No one could use it without such good documentation.
I have no idea whether spsc_queue is correctly implemented (I doubt it), and if even a highly experienced dev like you takes "an eternity" (quote) to understand it, one is really tempted to dream of spinlock_t, which has very clear semantics and is easily understood even by beginners.
Question is what should we do?
Easy!
Mid-term, we should replace spsc_queue with a boring, locked, super- slow linked list ^_^
The spsc_queue was designed and – perhaps – perf-measured when RR was the only scheduling policy.
Since the FIFO rework, where FIFO became the default policy, we now access our super efficient super fast lockless queue most of the time with the spinlock being taken immediately afterwards anyways. So we almost have a locked lockless queue now.
https://elixir.bootlin.com/linux/v6.18-rc4/source/drivers/gpu/drm/scheduler/...
Only push_job() often (?) still runs locklessly, but I'm not at all convinced that this is worth it. Not even performance-wise.
If you look at spsc_queue_push() you see that it 1. disables preemption, 2. uses atomic instructions and 3. has a ton of memory barries
– in other words, this is almost literally a manual re-implementation of a spinlock, just without mutual exclusion…
And even if something like spsc_queue would make sense and be actually worth it, then it should be provided to the entire kernel as common infrastructure, like llist.h, not hidden somewhere in DRM.
P.