On 27 August 2013 22:53, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
I am not aware of any immediate plans for adding new functions, but we have a few JIRA cards open to write better implementations of various string functions. memset in particular looks like it should be fairly easy to optimise further with NEON.
OK. The main libc functions I've seen showing up in perf reports have been memcpy and malloc/free so I think you're on target there...
memcpy has been improved in glibc 2.18 and backported into Linaro 13.07 toolchains, so that should be much better than it was. I'm looking into improving malloc/free performance at the moment.
We have tests and benchmarks for string functions in the cortex-strings package on launchpad and development of new code has been done there under a liberal license before pushing to glibc/newlib/bionic/etc.
Right. But as far as you know offhand there's nothing in the pipeline that is likely to have different optimal implementations on A8/A9/A15?
Nothing to my knowledge, but there are people that could maybe give a more detailed answer on this. ;-)
Note that ifunc resolvers get passed the HWCAP bits from the kernel, so that is the granularity of decision that can simply be resolved - that essentially means you can dispatch based on the presence of NEON or VFP rather than say any micro-architectural details (although you could in theory extract this information in the resolver if it is easily accessible in userspace).
Oh hm. So you can't easily separate A8/A9/A15 per se? That's potentially unfortunate from what I hear, but is quite a way away from my area of expertise :-)
Not using the HWCAP bits. It may be possible to pull the information out of a register somewhere, but AFAIK it's privilege protected so not possible from userspace.