On 07/02/2017 13:50, Bharat Bhushan wrote:
Hi All,
I am working on log10/qsort benchmarks on ARM64 (ARMv8) processor,
I want to check if we have experience with these benchmarks. Actually i am looking for a compiler version which gives best results with these benchmarks and specific compiler optimization (in my case is see O3 gives best numbers) ?
I have tried GCC-4.9 and GCC-6.2 with log10 benchmark and my observations are:
With gcc 4.9 - 140 us 2) With GCC 6.2 - 150 us
My compilation flags are "-O3 -ftree-vectorize -funroll-all-loops --param max-inline-insns-auto=550 --param case-values-threshold=30 -falign-functions=32 -ftracer"
So it seems like gcc-6.2 is better, am i missing something, should i use some better compiler flags?
It is really hard to give you any advise without actual code to check what exactly you are measuring. Are you using a custom implemented log10 or the glibc one?
The compiler options seems what you expect to use for a mathematical workload, however I would profile and check if both '-funroll-all-loops' and the '--param max-inline-insns-auto=550 --param case-values-threshold=30' are actually helping on this case. All tend to increase code size and it might or not be the case where it put icache pressure, it really really depend of the workload and dataflow.
In any way, it would be good to profile the code to check exactly where is the hotspot and based on the code and its characteristics check if any other flags or even kind of optimization (pgo, ipa) can help you out.