 
            . Would you be interested in adding a Firefox-based benchmark? As a large application it is a good testbed for LTO, FDO and other aggressive optimizations.
Sorry about the delayed response. I did notice your mail last week but I was busy with our conference and then the first couple of days this week have just disappeared with some internal training.
I would be interested in hearing how you get on with LTO and FDO on ARM. Listening to Honza talking at the GCC unconference in London about the memory usage for full LTO with trunk I did wonder what would happen if we tried it on the ARM target to see what we got, but I never managed to get around to trying anything there :) . We did look at getting FDO working with Linaro GCC last cycle but there are still a couple of issues with PGO in Linaro GCC 4.5.
With respect to LTO , the one problem we have currently is that the Neon intrinsics aren't streamed out and streamed back in. So you might have a few issues if your code uses arm_neon.h . https://bugs.launchpad.net/gcc-linaro/+bug/823548 is an example of this problem. This was fixed upstream and we probably just need to backport that into our 4.6 tree. I've tried a backport this morning and I think I have this right finally.
If you could do a build and a firefox benchmark run in about 30-60 minutes by all means please do let us know how you get on and what you find. We've been steadily trying to improve the performance of the ARM toolchain and the biggest improvements you'll notice will be with the vectorizer but there will be other small improvements that you'll notice in other general areas of code generation. We would be interested in feedback about what can be done and to add to our queue of things to look at and improve for the ARM port of GCC.
With respect to the images, Kiko's probably answered that bit.
cheers Ramana
 
            2011/8/10 Ramana Radhakrishnan ramana.radhakrishnan@linaro.org:
. Would you be interested in adding a Firefox-based benchmark? As a large application it is a good testbed for LTO, FDO and other aggressive optimizations.
I would be interested in hearing how you get on with LTO and FDO on ARM. Listening to Honza talking at the GCC unconference in London about the memory usage for full LTO with trunk I did wonder what would happen if we tried it on the ARM target to see what we got, but I never managed to get around to trying anything there :) . We did look at getting FDO working with Linaro GCC last cycle but there are still a couple of issues with PGO in Linaro GCC 4.5.
FYI. The toolchain benchmark suite derived from Google already includes the FDO mode, and I would suggest to enable it for comparisons.
Android build system has (incomplete) FDO integration since Android 2.2[*]. In my experience, it sometimes helps the performance for special cases slightly.
Sincerely, -jserv
[*] The build system would perform "build-run-build" scheme with the help of ADB, which deploys the profiler on target. Option: BUILD_FDO_INSTRUMENT
 
            On Wed, Aug 10, 2011 at 04:29:46PM +0100, Ramana Radhakrishnan wrote:
Sorry about the delayed response. I did notice your mail last week but I was busy with our conference and then the first couple of days this week have just disappeared with some internal training.
I would be interested in hearing how you get on with LTO and FDO on ARM. Listening to Honza talking at the GCC unconference in London about the memory usage for full LTO with trunk I did wonder what would happen if we tried it on the ARM target to see what we got, but I never managed to get around to trying anything there :) . We did look at getting FDO working with Linaro GCC last cycle but there are still a couple of issues with PGO in Linaro GCC 4.5.
I gave a try to Linaro GCC 4.5, and at first glance, it looks pretty much on par, performance-wise, with upstream GCC 4.6.1.
With respect to LTO , the one problem we have currently is that the Neon intrinsics aren't streamed out and streamed back in. So you might have a few issues if your code uses arm_neon.h . https://bugs.launchpad.net/gcc-linaro/+bug/823548 is an example of this problem. This was fixed upstream and we probably just need to backport that into our 4.6 tree. I've tried a backport this morning and I think I have this right finally.
I haven't tried LTO with Linaro GCC, but with upsteam GCC 4.6.1, I hit something like the following: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41159
I'm currently running some more FDO tests, so I'll have more on that later. From my attempts so far with a possibly more correct profile than my very first attempts, it looks like it's not so bad (it used to regress in my first attempts), but it doesn't look better either. The resulting binary is however much bigger.
Cheers,
Mike
linaro-toolchain@lists.linaro.org


