On 18/12/13 05:06, Jonathan S. Shapiro wrote:
At the risk of sticking my nose in, this isn't a startup code issue. It's a contract issue.
First, I don't buy Richard's argument about memcpy() startup costs and hard-to-predict branches. We do those tests on essentially every *other* RISC platform without complaint, and it's very easy to order those branches so that the currently efficient cases run well. Perhaps more to the point, I haven't seen anybody put forward quantitative data that using the MMU for unaligned references is any better than executing those branches. Speaking as a recovering processor architect, that assumption needs to be validated quantitatively. My guess is that the branches are faster if properly arranged.
Second, this is a contract issue. If newlib intends to support embedded platforms, then it needs to implement algorithms that are functionally correct without relying on an MMU. By all means use simpler or smarter algorithms when an MMU can be assumed to be available in a given configuration, but provide an algorithm that is functionally correct when no MMU is available. "Good overall performance in memcpy" is a fine thing, but it is subject to the requirement of meeting functional specifications. As Jochen Liedtke famously put it (read this in a heavy German accent): "Fast, ya. But correct? (shrug) Eh!"
So: we need a normative statement saying what the contract is. The rest of the answer will fall out from that.
I do agree with Richard that startup code is special. I've built deeply embedded runtimes of one form or another for 25 years now, and I have yet to see a system where optimizing a simplistic byte-wise memcpy during bootstrap would have made any difference in anything overall. That said, if the specification of memcpy requires it to handle incompatibly aligned pointers (and it does), and the contract for newlib requires it to operate in MMU-less scenarios in a given configuration (which, at least in some cases, it does), it's completely legitimate to expect that bootstrap code can call memcpy() and expect behavior that meets specifications.
So what's the contract?
I disagree with your assertion that newlib *requires* it to operate in an MMU-less scenario for all targets; it only does so when the target can reasonably be expected to not have an MMU.
The only contract that exists is the one written in the C standard:
7.23.2.1#2 The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.
But that is written on the assumption that we're in a normal execution environment, not in some special case.
What you're missing is that AArch64 is (in ARM ARM terms) an A-profile only environment where an MMU is mandated in the system. Furthermore, processors implementing the architecture will *expect* that the MMU be turned on as soon as possible after boot, since without this the caches cannot be used and without those the performance will be truly horrible. Once the caches are enabled, it's perfectly reasonable to assume that memcpy will only be used for copies to and from NORMAL memory, since other types of memory have potential side effects, which means that use of memcpy would be unsafe.
If you want to write an MMU-less memcpy, then feel free to write one; but please install it with a different interface -- something like __memcpy_nommu(). Don't penalise the standard case for the non-standard exceptional one.
R.