On 11 June 2012 02:14, Michael Hope michael.hope@linaro.org wrote:
We talked at Connect about finishing up the cortex-strings work by upstreaming them into Bionic, Newlib, and GLIBC. I've written up one of our standard 'Output' pages:
https://wiki.linaro.org/WorkingGroups/ToolChain/Outputs/CortexStrings
with a summary of what we did, what else exists, benchmark results, and next steps. This can be used to justify the routines to the different upstreams.
The Android guys are going to upstream these to Bionic. I need a volunteer to do Newlib and GLIBC.
One surprise was that the Newlib plain C routines are very good on strings - probably due to a good end of string detector.
Those graphs end at 4k, which is well within even L1 cache. How do these functions compare for sizes that hit L2 or external memory? I would expect functions doing some prefetching to perform better there. Some time ago, I compared a few memcpy() implementations on large blocks, and the Bionic NEON-optimised one was several times faster than glibc. It is of course possible that glibc has improved since then.