Excerpts from Russell King - ARM Linux admin's message of December 29, 2020 8:44 pm:
On Tue, Dec 29, 2020 at 01:09:12PM +1000, Nicholas Piggin wrote:
I think it should certainly be documented in terms of what guarantees it provides to application, _not_ the kinds of instructions it may or may not induce the core to execute. And if existing API can't be re-documented sanely, then deprecatd and new ones added that DTRT. Possibly under a new system call, if arch's like ARM want a range flush and we don't want to expand the multiplexing behaviour of membarrier even more (sigh).
The 32-bit ARM sys_cacheflush() is there only to support self-modifying code, and takes whatever actions are necessary to support that. Exactly what actions it takes are cache implementation specific, and should be of no concern to the caller, but the underlying thing is... it's to support self-modifying code.
Caveat cacheflush() should not be used in programs intended to be portable. On Linux, this call first appeared on the MIPS architecture, but nowa‐ days, Linux provides a cacheflush() system call on some other architec‐ tures, but with different arguments.
What a disaster. Another badly designed interface, although it didn't originate in Linux it sounds like we weren't to be outdone so we messed it up even worse.
flushing caches is neither necessary nor sufficient for code modification on many processors. Maybe some old MIPS specific private thing was fine, but certainly before it grew to other architectures, somebody should have thought for more than 2 minutes about it. Sigh.
Sadly, because it's existed for 20+ years, and it has historically been sufficient for other purposes too, it has seen quite a bit of abuse despite its design purpose not changing - it's been used by graphics drivers for example. They quickly learnt the error of their ways with ARMv6+, since it does not do sufficient for their purposes given the cache architectures found there.
Let's not go around redesigning this after twenty odd years, requiring a hell of a lot of pain to users. This interface is called by code generated by GCC, so to change it you're looking at patching GCC as well as the kernel, and you basically will make new programs incompatible with older kernels - very bad news for users.
For something to be redesigned it had to have been designed in the first place, so there is no danger of that don't worry... But no I never suggested making incompatible changes to any existing system call, I said "re-documented". And yes I said deprecated but in Linux that really means kept indefinitely.
If ARM, MIPS, 68k etc programs and toolchains keep using what they are using it'll keep working no problem.
The point is we're growing new interfaces, and making the same mistakes. It's not portable (ARCH_HAS_MEMBARRIER_SYNC_CORE), it's also specified in terms of low level processor operations rather than higher level intent, and also is not sufficient for self-modifying code (without additional cache flush on some processors).
The application wants a call that says something like "memory modified before the call will be visible as instructions (including illegal instructions) by all threads in the program after the system call returns, and no threads will be subject to any effects of executing the previous contents of that memory.
So I think the basics are simple (although should confirm with some JIT and debugger etc developers, and not just Android mind you). There are some complications in details, address ranges, virtual/physical, thread local vs process vs different process or system-wide, memory ordering and propagation of i and d sides, etc. But that can be worked through, erring on the side of sanity rather than pointless micro-optmisations.
Thanks, Nick