Re: Compiler selection for log10/Qsort on ARM64

7 Feb 2017

      On 07/02/2017 13:50, Bharat Bhushan wrote:
...
Hi All,
I am working on log10/qsort benchmarks on ARM64 (ARMv8) processor,
I want to check if we have experience with these benchmarks.
Actually i am looking for a compiler version which gives best results with these benchmarks and specific compiler optimization (in my case is see O3 gives best numbers) ?
I have tried GCC-4.9 and GCC-6.2 with log10 benchmark and my observations are:

 With gcc 4.9    -   140 us

     2)      With GCC 6.2   -   150 us

My compilation flags are "-O3 -ftree-vectorize -funroll-all-loops --param max-inline-insns-auto=550 --param case-values-threshold=30 -falign-functions=32 -ftracer"
So it seems like gcc-6.2 is better, am i missing something, should i use some better compiler flags?
It is really hard to give you any advise without actual code to check what
exactly you are measuring.  Are you using a custom implemented log10 or
the glibc one?
The compiler options seems what you expect to use for a mathematical workload,
however I would profile and check if both '-funroll-all-loops' and the
'--param max-inline-insns-auto=550 --param case-values-threshold=30' are
actually helping on this case.  All tend to increase code size and it
might or not be the case where it put icache pressure, it really really
depend of the workload and dataflow.
In any way, it would be good to profile the code to check exactly where
is the hotspot and based on the code and its characteristics check if
any other flags or even kind of optimization (pgo, ipa) can help you out.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Compiler selection for log10/Qsort on ARM64