Re: [PATCH] locking/atomic: Make test_and_*_bit() ordered on failure

16 Aug 2022

      On Tue, Aug 16, 2022 at 10:17 AM Arnd Bergmann arnd@arndb.de wrote:
...
On Tue, Aug 16, 2022 at 9:03 AM Hector Martin marcan@marcan.st wrote:
...
These operations are documented as always ordered in
include/asm-generic/bitops/instrumented-atomic.h, and producer-consumer
type use cases where one side needs to ensure a flag is left pending
after some shared data was updated rely on this ordering, even in the
failure case.
This is the case with the workqueue code, which currently suffers from a
reproducible ordering violation on Apple M1 platforms (which are
notoriously out-of-order) that ends up causing the TTY layer to fail to
deliver data to userspace properly under the right conditions. This
change fixes that bug.
Change the documentation to restrict the "no order on failure" story to
the _lock() variant (for which it makes sense), and remove the
early-exit from the generic implementation, which is what causes the
missing barrier semantics in that case. Without this, the remaining
atomic op is fully ordered (including on ARM64 LSE, as of recent
versions of the architecture spec).
Suggested-by: Linus Torvalds torvalds@linux-foundation.org
Cc: stable@vger.kernel.org
Fixes: e986a0d6cb36 ("locking/atomics, asm-generic/bitops/atomic.h: Rewrite using atomic_*() APIs")
Fixes: 61e02392d3c7 ("locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit()")
Signed-off-by: Hector Martin marcan@marcan.st

Documentation/atomic_bitops.txt     | 2 +-
 include/asm-generic/bitops/atomic.h | 6 ------
I double-checked all the architecture specific implementations to ensure
that the asm-generic one is the only one that needs the fix.
I assume this gets merged through the locking tree or that Linus picks it up
directly, not through my asm-generic tree.
Reviewed-by: Arnd Bergmann arnd@arndb.de

linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Testing this patch on pre Armv8.1 specifically Cortex-A72 and
Cortex-A53 cores I am seeing
a huge performance drop with this patch applied. Perf is showing
lock_is_held_type() as the worst offender
but that could just be the function getting blamed. The most obvious
indicator of the performance loss is
ssh throughput.  With the patch I am only able to achieve around
20MB/s and without the patch I am able
to transfer around 112MB/s, no other changes.
When I have more time I can do some more in depth testing, but for now
I just wanted to bring this
issue up so perhaps others can chime in regarding how it performs on
their hardware.
-Jon

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] locking/atomic: Make test_and_*_bit() ordered on failure