On Tue, 16 Aug 2022 14:05:54 +0100, Jon Nettleton jon@solid-run.com wrote:
On Tue, Aug 16, 2022 at 3:01 PM Will Deacon will@kernel.org wrote:
On Tue, Aug 16, 2022 at 02:29:49PM +0200, Jon Nettleton wrote:
On Tue, Aug 16, 2022 at 10:17 AM Arnd Bergmann arnd@arndb.de wrote:
On Tue, Aug 16, 2022 at 9:03 AM Hector Martin marcan@marcan.st wrote:
These operations are documented as always ordered in include/asm-generic/bitops/instrumented-atomic.h, and producer-consumer type use cases where one side needs to ensure a flag is left pending after some shared data was updated rely on this ordering, even in the failure case.
This is the case with the workqueue code, which currently suffers from a reproducible ordering violation on Apple M1 platforms (which are notoriously out-of-order) that ends up causing the TTY layer to fail to deliver data to userspace properly under the right conditions. This change fixes that bug.
Change the documentation to restrict the "no order on failure" story to the _lock() variant (for which it makes sense), and remove the early-exit from the generic implementation, which is what causes the missing barrier semantics in that case. Without this, the remaining atomic op is fully ordered (including on ARM64 LSE, as of recent versions of the architecture spec).
Suggested-by: Linus Torvalds torvalds@linux-foundation.org Cc: stable@vger.kernel.org Fixes: e986a0d6cb36 ("locking/atomics, asm-generic/bitops/atomic.h: Rewrite using atomic_*() APIs") Fixes: 61e02392d3c7 ("locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit()") Signed-off-by: Hector Martin marcan@marcan.st
Documentation/atomic_bitops.txt | 2 +- include/asm-generic/bitops/atomic.h | 6 ------
I double-checked all the architecture specific implementations to ensure that the asm-generic one is the only one that needs the fix.
I assume this gets merged through the locking tree or that Linus picks it up directly, not through my asm-generic tree.
Reviewed-by: Arnd Bergmann arnd@arndb.de
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Testing this patch on pre Armv8.1 specifically Cortex-A72 and Cortex-A53 cores I am seeing a huge performance drop with this patch applied. Perf is showing lock_is_held_type() as the worst offender
Hmm, that should only exist if LOCKDEP is enabled and performance tends to go out of the window if you have that on. Can you reproduce the same regression with lockdep disabled?
Will
Yep I am working on it. We should note that
config LOCKDEP_SUPPORT def_bool y
is the default for arm64
Yes, as the architecture supports LOCKDEP. However, you probably have something like CONFIG_PROVE_LOCKING to see such a performance hit (and that's definitely not on by default).
M.