Hi all,
There has been interest from LEG members to ensure that optimal library routines are used on their platforms. My understanding is that the "correct" way of doing this these days is to use ifuncs to select the best implementation for a given system.
I see that glibc 2.18 contains an ifunc-ed version of memcpy. Does the TCWG have a hit list of other functions that might get the same treatment? If so, does it have a plan and the resources to implement them? If it's a matter of resources, I think LEG might be able to help there.
Cheers, mwh
I think a better way of doing some of these functions is using ifunc with a vdso. So the glibc does not have to be updated; only the kernel.
Thanks, Andrew
Sent from my iPad
On Aug 26, 2013, at 8:17 PM, "Michael Hudson-Doyle" michael.hudson@linaro.org wrote:
Hi all,
There has been interest from LEG members to ensure that optimal library routines are used on their platforms. My understanding is that the "correct" way of doing this these days is to use ifuncs to select the best implementation for a given system.
I see that glibc 2.18 contains an ifunc-ed version of memcpy. Does the TCWG have a hit list of other functions that might get the same treatment? If so, does it have a plan and the resources to implement them? If it's a matter of resources, I think LEG might be able to help there.
Cheers, mwh
linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
On 27 August 2013 04:16, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi Michael,
There has been interest from LEG members to ensure that optimal library routines are used on their platforms. My understanding is that the "correct" way of doing this these days is to use ifuncs to select the best implementation for a given system.
I see that glibc 2.18 contains an ifunc-ed version of memcpy. Does the TCWG have a hit list of other functions that might get the same treatment? If so, does it have a plan and the resources to implement them? If it's a matter of resources, I think LEG might be able to help there.
I am not aware of any immediate plans for adding new functions, but we have a few JIRA cards open to write better implementations of various string functions. memset in particular looks like it should be fairly easy to optimise further with NEON. We have tests and benchmarks for string functions in the cortex-strings package on launchpad and development of new code has been done there under a liberal license before pushing to glibc/newlib/bionic/etc. There's been some investigation into optimising libm too, but I think any work on that would likely be further away still.
Note that ifunc resolvers get passed the HWCAP bits from the kernel, so that is the granularity of decision that can simply be resolved - that essentially means you can dispatch based on the presence of NEON or VFP rather than say any micro-architectural details (although you could in theory extract this information in the resolver if it is easily accessible in userspace).
Hi Will,
Thanks for the reply.
Will Newton will.newton@linaro.org writes:
On 27 August 2013 04:16, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi Michael,
There has been interest from LEG members to ensure that optimal library routines are used on their platforms. My understanding is that the "correct" way of doing this these days is to use ifuncs to select the best implementation for a given system.
I see that glibc 2.18 contains an ifunc-ed version of memcpy. Does the TCWG have a hit list of other functions that might get the same treatment? If so, does it have a plan and the resources to implement them? If it's a matter of resources, I think LEG might be able to help there.
I am not aware of any immediate plans for adding new functions, but we have a few JIRA cards open to write better implementations of various string functions. memset in particular looks like it should be fairly easy to optimise further with NEON.
OK. The main libc functions I've seen showing up in perf reports have been memcpy and malloc/free so I think you're on target there...
We have tests and benchmarks for string functions in the cortex-strings package on launchpad and development of new code has been done there under a liberal license before pushing to glibc/newlib/bionic/etc.
Right. But as far as you know offhand there's nothing in the pipeline that is likely to have different optimal implementations on A8/A9/A15?
There's been some investigation into optimising libm too, but I think any work on that would likely be further away still.
OK.
Note that ifunc resolvers get passed the HWCAP bits from the kernel, so that is the granularity of decision that can simply be resolved - that essentially means you can dispatch based on the presence of NEON or VFP rather than say any micro-architectural details (although you could in theory extract this information in the resolver if it is easily accessible in userspace).
Oh hm. So you can't easily separate A8/A9/A15 per se? That's potentially unfortunate from what I hear, but is quite a way away from my area of expertise :-)
Cheers, mwh
On 27 August 2013 22:53, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
I am not aware of any immediate plans for adding new functions, but we have a few JIRA cards open to write better implementations of various string functions. memset in particular looks like it should be fairly easy to optimise further with NEON.
OK. The main libc functions I've seen showing up in perf reports have been memcpy and malloc/free so I think you're on target there...
memcpy has been improved in glibc 2.18 and backported into Linaro 13.07 toolchains, so that should be much better than it was. I'm looking into improving malloc/free performance at the moment.
We have tests and benchmarks for string functions in the cortex-strings package on launchpad and development of new code has been done there under a liberal license before pushing to glibc/newlib/bionic/etc.
Right. But as far as you know offhand there's nothing in the pipeline that is likely to have different optimal implementations on A8/A9/A15?
Nothing to my knowledge, but there are people that could maybe give a more detailed answer on this. ;-)
Note that ifunc resolvers get passed the HWCAP bits from the kernel, so that is the granularity of decision that can simply be resolved - that essentially means you can dispatch based on the presence of NEON or VFP rather than say any micro-architectural details (although you could in theory extract this information in the resolver if it is easily accessible in userspace).
Oh hm. So you can't easily separate A8/A9/A15 per se? That's potentially unfortunate from what I hear, but is quite a way away from my area of expertise :-)
Not using the HWCAP bits. It may be possible to pull the information out of a register somewhere, but AFAIK it's privilege protected so not possible from userspace.
linaro-toolchain@lists.linaro.org