On Wed, Aug 10, 2011 at 04:29:46PM +0100, Ramana Radhakrishnan wrote:
Sorry about the delayed response. I did notice your mail last week but I was busy with our conference and then the first couple of days this week have just disappeared with some internal training.
I would be interested in hearing how you get on with LTO and FDO on ARM. Listening to Honza talking at the GCC unconference in London about the memory usage for full LTO with trunk I did wonder what would happen if we tried it on the ARM target to see what we got, but I never managed to get around to trying anything there :) . We did look at getting FDO working with Linaro GCC last cycle but there are still a couple of issues with PGO in Linaro GCC 4.5.
I gave a try to Linaro GCC 4.5, and at first glance, it looks pretty much on par, performance-wise, with upstream GCC 4.6.1.
With respect to LTO , the one problem we have currently is that the Neon intrinsics aren't streamed out and streamed back in. So you might have a few issues if your code uses arm_neon.h . https://bugs.launchpad.net/gcc-linaro/+bug/823548 is an example of this problem. This was fixed upstream and we probably just need to backport that into our 4.6 tree. I've tried a backport this morning and I think I have this right finally.
I haven't tried LTO with Linaro GCC, but with upsteam GCC 4.6.1, I hit something like the following: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41159
I'm currently running some more FDO tests, so I'll have more on that later. From my attempts so far with a possibly more correct profile than my very first attempts, it looks like it's not so bad (it used to regress in my first attempts), but it doesn't look better either. The resulting binary is however much bigger.
Cheers,
Mike