Hi,
Also available is an early release of optimised string routines for the Cortex-A series, including a mix of NEON and Thumb-2 versions of memcpy(), memset(), strcpy(), strcmp(), and strlen(). For more information see: https://launchpad.net/cortex-strings
My understanding is that the NEON optimisation will give some performance gain *ONLY* on Cortex-A8 but it will also burn more energy. On other CPU, e.g. Cortex-A9, there is no performance gain but still it will cost more energy. Linaro toolchain doesn't target a specific platform but is generic for armv7 platforms. Are you expecting to see those optimisations turned on in Linaro toolchain?
The NEON-optimised version is also beneficial for large copies, but it is not on short copies when the NEON unit has to be powered up (Linux kernel will get an exception to turn it on). I guess your benchmark didn't take that into account. Can the NEON-optimised version be changed so that it is not used for small copies?
Guillaume
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.