On Mon, Dec 28, 2020 at 9:23 AM Russell King - ARM Linux admin linux@armlinux.org.uk wrote:
On Mon, Dec 28, 2020 at 09:14:23AM -0800, Andy Lutomirski wrote:
On Mon, Dec 28, 2020 at 2:25 AM Russell King - ARM Linux admin linux@armlinux.org.uk wrote:
On Sun, Dec 27, 2020 at 01:36:13PM -0800, Andy Lutomirski wrote:
On Sun, Dec 27, 2020 at 12:18 PM Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
----- On Dec 27, 2020, at 1:28 PM, Andy Lutomirski luto@kernel.org wrote:
I admit that I'm rather surprised that the code worked at all on arm64, and I'm suspicious that it has never been very well tested. My apologies for not reviewing this more carefully in the first place.
Please refer to Documentation/features/sched/membarrier-sync-core/arch-support.txt
It clearly states that only arm, arm64, powerpc and x86 support the membarrier sync core feature as of now:
Sigh, I missed arm (32). Russell or ARM folks, what's the right incantation to make the CPU notice instruction changes initiated by other cores on 32-bit ARM?
You need to call flush_icache_range(), since the changes need to be flushed from the data cache to the point of unification (of the Harvard I and D), and the instruction cache needs to be invalidated so it can then see those updated instructions. This will also take care of the necessary barriers that the CPU requires for you.
With what parameters? From looking at the header, this is for the case in which the kernel writes some memory and then intends to execute it. That's not what membarrier() does at all. membarrier() works like this:
You didn't specify that you weren't looking at kernel memory.
If you're talking about userspace, then the interface you require is flush_icache_user_range(), which does the same as flush_icache_range() but takes userspace addresses. Note that this requires that the memory is currently mapped at those userspace addresses.
If that doesn't fit your needs, there isn't an interface to do what you require, and it basically means creating something brand new on every architecture.
What you are asking for is not "just a matter of a few instructions". I have stated the required steps to achieve what you require above; that is the minimum when you have non-snooping harvard caches, which the majority of 32-bit ARMs have.
User thread 1:
write to RWX memory *or* write to an RW alias of an X region. membarrier(...); somehow tell thread 2 that we're ready (with a store release, perhaps, or even just a relaxed store.)
User thread 2:
wait for the indication from thread 1. barrier(); jump to the code.
membarrier() is, for better or for worse, not given a range of addresses.
Then, I'm sorry, it can't work on 32-bit ARM.
Is there a way to flush the *entire* user icache? If so, and if it has reasonable performance, then it could probably be used here. Otherwise I'll just send a revert for this whole mechanism on 32-bit ARM.
--Andy