Re: [PATCH] workqueue: Fix memory ordering race in queue_work*()

16 Aug 2022


      On Wed, Aug 17, 2022 at 01:22:09AM +0900, Hector Martin wrote:
...
On 16/08/2022 23.55, Boqun Feng wrote:
...
On Tue, Aug 16, 2022 at 02:41:57PM +0100, Will Deacon wrote:
...
It's worth noting that with the spinlock-based implementation (i.e.
prior to e986a0d6cb36) then we would have the same problem on
architectures that implement spinlocks with acquire/release semantics;
accesses from outside of the critical section can drift in and reorder
with each other there, so the conversion looked legitimate to me in
isolation and I vaguely remember going through callers looking for
potential issues. Alas, I obviously missed this case.
I just to want to mention that although spinlock-based atomic bitops
don't provide the full barrier in test_and_set_bit(), but they don't
have the problem spotted by Hector, because test_and_set_bit() and
clear_bit() sync with each other via locks:
test_and_set_bit():
     lock(..);
     old = *p; // mask is already set by other test_and_set_bit()
     *p = old | mask;
     unlock(...);
   			clear_bit():
   			  lock(..);
   			  *p ~= mask;
   			  unlock(..);
so "having a full barrier before test_and_set_bit()" may not be the
exact thing we need here, as long as a test_and_set_bit() can sync with
a clear_bit() uncondiontally, then the world is safe. For example, we
can make test_and_set_bit() RELEASE, and clear_bit() ACQUIRE on ARM64:
test_and_set_bit():
     atomic_long_fetch_or_release(..); // pair with clear_bit()
     				    // guarantee everything is
   				    // observed.
     			clear_bit():
   			  atomic_long_fetch_andnot_acquire(..);
     
, maybe that's somewhat cheaper than a full barrier implementation.
Thoughts? Just to find the exact ordering requirement for bitops.
It's worth pointing out that the workqueue code does *not* pair
test_and_set_bit() with clear_bit(). It does an atomic_long_set()
instead (and then there are explicit barriers around it, which are
expected to pair with the implicit barrier in test_and_set_bit()). If we
define test_and_set_bit() to only sync with clear_bit() and not
necessarily be a true barrier, that breaks the usage of the workqueue code.
Ah, I miss that, but that means the old spinlock-based atomics are
totally broken unless spinlock means full barriers on these archs.
But still, if we define test_and_set_bit() as RELEASE atomic instead of 
a full barrier + atomic, it should work for workqueue, right? Do we
actually need extra ordering here?
WRITE_ONCE(*x, 1); // A
    test_and_set_bit(..); // a full barrier will order A & B
    WRITE_ONCE(*y, 1); // B
That's something I want to figure out.
Regards,
Boqun
...

Hector

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] workqueue: Fix memory ordering race in queue_work*()