On 16/08/2022 14.27, Linus Torvalds wrote:
On Mon, Aug 15, 2022 at 9:15 PM Herbert Xu herbert@gondor.apana.org.au wrote:
Please revert this as test_and_set_bit was always supposed to be a full memory barrier. This is an arch bug.
<snip>
From looking at it, the asm-generic implementation is a bit questionable, though. In particular, it does
if (READ_ONCE(*p) & mask) return 1;
so it's *entirely* unordered for the "bit was already set" case.
These ops are documented in Documentation/atomic_bitops.txt as being unordered in the failure ("bit was already set" case), and that matches the generic implementation (which arm64 uses).
On the other hand, Twitter just pointed out that contradicting documentation exists (I believe this was the source of the third party doc I found that claimed it's always a barrier):
include/asm-generic/bitops/instrumented-atomic.h
So either one doc and the implementation are wrong, or the other doc is wrong.
--- a/include/asm-generic/bitops/atomic.h +++ b/include/asm-generic/bitops/atomic.h @@ -39,9 +39,6 @@ arch_test_and_set_bit( unsigned long mask = BIT_MASK(nr);
p += BIT_WORD(nr);
- if (READ_ONCE(*p) & mask)
return 1;
- old = arch_atomic_long_fetch_or(mask, (atomic_long_t *)p); return !!(old & mask);
}
@@ -53,9 +50,6 @@ arch_test_and_clear_bit unsigned long mask = BIT_MASK(nr);
p += BIT_WORD(nr);
- if (!(READ_ONCE(*p) & mask))
return 0;
- old = arch_atomic_long_fetch_andnot(mask, (atomic_long_t *)p); return !!(old & mask);
}
but the above is not just whitespace-damaged, it's entirely untested and based purely on me looking at that code.
That does work, I in fact tried that fix first to prove that the early-exit was the problem.
I don't have a particularly strong opinion on which way is right here, I just saw the implementation matched the docs (that I found) and Will commented on the other thread implying this was all deliberate.
- Hector