Dear All
Our team in Samsung collected some performance metrics for the following 3 GCC cross compilers 1.. Gentoo Complier(part of Chrome OS Build Environment) 2.. GCC 4.4.1 (Code Sourcery). 3.. Linaro (gcc-linaro-4.5-2010.11-1) Flags used to Build Linaro Tool chain used Michael Hope Script .Just modified "GCCFLAGS = --with-mode=thumb --with-arch=armv7-a --with-float=softfp --with-fpu=neon --with-fpu=vfpv3-d16" a.. Using the above three tool chains we compiled the kernel of Chrome OS and did Coremark Performance test.(With same optimisation flag mentioned in the attachment) b.. Test Environment for all the three are the same.
My Questions 1.. Is there any build options that I am missing while I am building the Cross Compiler? 2.. Else is this performance degradation is a know issue and is the tool chain group working on it?.(If so whom to contact?) Any Pointers from you would be of great help to me. If you need any further details also do ping me
Regards Prashanth S
On Wed, Dec 8, 2010 at 6:59 PM, Prashanth S prashanth.s@samsung.com wrote:
Dear All
Our team in Samsung collected some performance metrics for the following 3 GCC cross compilers
Gentoo Complier(part of Chrome OS Build Environment) GCC 4.4.1 (Code Sourcery). Linaro (gcc-linaro-4.5-2010.11-1)
Flags used to Build Linaro Tool chain used Michael Hope Script .Just modified "GCCFLAGS = --with-mode=thumb --with-arch=armv7-a --with-float=softfp --with-fpu=neon --with-fpu=vfpv3-d16"
Using the above three tool chains we compiled the kernel of Chrome OS and did Coremark Performance test.(With same optimisation flag mentioned in the attachment) Test Environment for all the three are the same.
My Questions
Is there any build options that I am missing while I am building the Cross Compiler? Else is this performance degradation is a know issue and is the tool chain group working on it?.(If so whom to contact?)
Any Pointers from you would be of great help to me. If you need any further details also do ping me
Hi Prashanth. I'm a bit confused, as I'm talking with Sree from Samsung about the same topic at the moment. Here's what I said to him:
""" We run coremark along with a range of other tests with each continuous build. On a OMAP3 (Cortex-A8), we score 1570, plain GCC 4.4.4 scores 1431, and CodeSourcery GCC scores 1670. Note that all values are estimates, to fit in with the coremark reporting rules.
I'm not sure coremark is the best choice of benchmark as it's a synthetic benchmark with a deeply embedded focus. For comparison, we are 5 % ahead of CodeSourcery on pybench, up to 17 % ahead on h.264 decode, and 3 % ahead on Ogg/Vorbis decode. A large part of this is due to the upstream improvements between 4.4 and 4.5.
Given that, I'm currently looking into the difference between CodeSourcery GCC and Linaro GCC as there's a significant difference to be explained. """
Since then I've tracked it down further. It seems to be a regression between 4.4 and 4.5. I'm looking into running a wider suite of benchmarks on an A9 to decide on what to do next.
Hope that helps,
-- Michael
On Wed, Dec 8, 2010 at 7:10 PM, Michael Hope michael.hope@linaro.org wrote:
On Wed, Dec 8, 2010 at 6:59 PM, Prashanth S prashanth.s@samsung.com wrote:
Dear All
Our team in Samsung collected some performance metrics for the following 3 GCC cross compilers
Gentoo Complier(part of Chrome OS Build Environment) GCC 4.4.1 (Code Sourcery). Linaro (gcc-linaro-4.5-2010.11-1)
Flags used to Build Linaro Tool chain used Michael Hope Script .Just modified "GCCFLAGS = --with-mode=thumb --with-arch=armv7-a --with-float=softfp --with-fpu=neon --with-fpu=vfpv3-d16"
Some notes on the flags themselves:
The options in your email are quite decent: -mtune=cortex-a9 -mfloat-abi=softfp -mfpu=neon -ftree-vectorize -fomit-frame-pointer -ffast-math -mcpu=cortex-a9 -O3
You can prune that back a bit to: -mcpu=cortex-a9 -mfloat-abi=softfp -mfpu=neon -ffast-math -O3
as -O turns on -fomit-frame-pointer, -O3 turns on -ftree-vectorize, and -mcpu implies -mtune.
You can prune this back even further by configuring and building the compiler using --with-cpu and similar options. These set the defaults so you can use just: -ffast-math -O3
Be careful with -ffast-math. It will improve performance but means that floating point calculations no longer follow the full IEEE 754 standard. I'd turn this off until you verify individual packages.
-mfloat-abi=hard will give better performance but may involve a porting effort.
-fno-common also gives a small improvement in various benchmarks, but may break some programs.
Note that the 'best' options depend on the individual program. It's not unusual for programs to sometimes do better with -O2 than -O3, or better without -ftree-vectorise. I'd be interested in any situations you run across.
-- Michael
On Thu, 2010-12-09 at 13:31 +1300, Michael Hope wrote:
On Wed, Dec 8, 2010 at 7:10 PM, Michael Hope michael.hope@linaro.org wrote: -fno-common also gives a small improvement in various benchmarks, but may break some programs.
Any breakage with -fno-common would be detected at link time, so if your program compiles successfully it should run successfully.
The breakage is that some non-strictly conforming programs that declare variables tentatively with, for example:
int x;
in multiple translation units will get multiple-definition errors at link time.
R.
On Wed, Dec 8, 2010 at 7:10 PM, Michael Hope michael.hope@linaro.org wrote:
On Wed, Dec 8, 2010 at 6:59 PM, Prashanth S prashanth.s@samsung.com wrote:
Dear All
Our team in Samsung collected some performance metrics for the following 3 GCC cross compilers
Gentoo Complier(part of Chrome OS Build Environment) GCC 4.4.1 (Code Sourcery). Linaro (gcc-linaro-4.5-2010.11-1)
Flags used to Build Linaro Tool chain used Michael Hope Script .Just modified "GCCFLAGS = --with-mode=thumb --with-arch=armv7-a --with-float=softfp --with-fpu=neon --with-fpu=vfpv3-d16"
Using the above three tool chains we compiled the kernel of Chrome OS and did Coremark Performance test.(With same optimisation flag mentioned in the attachment) Test Environment for all the three are the same.
Hi Prashanth. I've been looking at benchmarks and various compilers during the week. Have a look at: https://wiki.linaro.org/MichaelHope/Sandbox/Benchmarks
It's far from complete and only contains a couple of benchmarks on the A8 only, but it makes interesting Friday afternoon reading.
Note the difference between coremark and pybench which reinforces the need to pick a benchmark that reflects your workload. coremark is interesting as the difference is almost solely due to the 'crcu8' function. GCC 4.4 based compilers unroll it where GCC 4.5 based compilers don't. Adding '-funroll-all-loops' (which I don't recommend you do on normal applications) brings the 4.5 based compilers back in front.
There is work to do here though. It seems that the unrolling heuristics could be improved, and the 4.4 compilers still win on coremark at -O3.
More to come...
-- Michael
On Wed, Dec 8, 2010 at 5:59 AM, Prashanth S prashanth.s@samsung.com wrote:
Dear All
Our team in Samsung collected some performance metrics for the following 3 GCC cross compilers
Gentoo Complier(part of Chrome OS Build Environment) GCC 4.4.1 (Code Sourcery). Linaro (gcc-linaro-4.5-2010.11-1)
Flags used to Build Linaro Tool chain used Michael Hope Script .Just modified "GCCFLAGS = --with-mode=thumb --with-arch=armv7-a --with-float=softfp --with-fpu=neon --with-fpu=vfpv3-d16"
Does anyone know the effect of passing two -mfpu= options? I suspect the second may override the first.
Also, we tend to avoid using -mfpu=neon by default, because a) it's not expected to bring a big benefit for general-purpose code, and b) the resulting code isn't compatible with v7 implementations such as Marvell's or NVIDIA's which don't support NEON. It would be interesting to compare with this option removed -- hopefully the results will come out about the same.
Cheers ---Dave
On 08/12/10 05:59, Prashanth S wrote:
Dear All Our team in Samsung collected some performance metrics for the following 3 GCC cross compilers
- Gentoo Complier(part of Chrome OS Build Environment)
- GCC 4.4.1 (Code Sourcery).
- Linaro (gcc-linaro-4.5-2010.11-1)
I'd be interested to see what figures you get for CodeSourcery 2010.09 (i.e. gcc 4.5) and/or Linaro 4.4 releases. Right now we're not comparing like with like.
Linaro has a slightly stricter patch acceptance policy than CodeSourcery, and this does hurt the figures slightly, but I'd expect them to be roughly equivalent within the same compiler version.
Andrew
linaro-toolchain@lists.linaro.org