On Thu, Apr 15, 2021 at 04:28:21PM +0100, Will Deacon wrote:
On Thu, Apr 15, 2021 at 05:03:58PM +0200, Peter Zijlstra wrote:
@@ -73,9 +75,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock) /* When no more readers or writers, set the locked flag */ do {
atomic_cond_read_acquire(&lock->cnts, VAL == _QW_WAITING);
- } while (atomic_cmpxchg_relaxed(&lock->cnts, _QW_WAITING,
_QW_LOCKED) != _QW_WAITING);
cnt = atomic_cond_read_acquire(&lock->cnts, VAL == _QW_WAITING);
I think the issue is that >here< a concurrent reader in interrupt context can take the lock and release it again, but we could speculate reads from the critical section up over the later release and up before the control dependency here...
OK, so that was totally not clear from the original changelog.
- } while (!atomic_try_cmpxchg_relaxed(&lock->cnts, &cnt, _QW_LOCKED));
... and then this cmpxchg() will succeed, so our speculated stale reads could be used.
*HOWEVER*
Speculating a read should be fine in the face of a concurrent _reader_, so for this to be an issue it implies that the reader is also doing some (atomic?) updates.
So we're having something like:
CPU0 CPU1
queue_write_lock_slowpath() atomic_cond_read_acquire() == _QW_WAITING
queued_read_lock_slowpath() atomic_cond_read_acquire() return; ,--> (before X's store) | X = y; | | queued_read_unlock() | (void)atomic_sub_return_release() | | atomic_cmpxchg_relaxed(.old = _QW_WAITING) | `-- r = X;
Which as Will said is a cmpxchg ABA, however for it to be ABA, we have to observe that unlock-store, and won't we then also have to observe the whole critical section?
Ah, the issue is yet another load inside our own (CPU0)'s critical section. which isn't ordered against the cmpxchg_relaxed() and can be issued before.
So yes, then making the cmpxchg an acquire, will force all our own loads to happen later.