Hi Dave. I've been hacking away and have checked in a couple of benchmarking and plotting scripts to lp:cortex-strings. The current results are at: http://people.linaro.org/~michaelh/incoming/strings-performance/
All are done on an A9. The results are very incomplete due to how long things take to run. I'll leave ursa3 doing these over the weekend which should flesh this out for the other routines.
Your new memcpy() is looking good as well - as fast as GLIBC.
-- Michael