Hi,
I did some tests on the following function
--- CUT HERE --- int fibo(int n) { if (n < 2) return 1; return (fibo(n-2) + fibo(n-1)); } --- CUT HERE ---
and I discovered that it is faster -O2 than -O3. This is with gcc 4.9.2.
Looking at the disassembly I see it is using FP registers to hold integer values. The following is a small extract.
.L3: fmov w0, s8 sub w25, w25, #1 cmn w25, #1 add w0, w0, w27 fmov s8, w0 bne .L19 add w0, w0, 1 b .L2
Recompiling with -mgeneral-regs-only generates a huge improvement.
The following are the times I get on various partner HW. I have normalised the -O2 times to 1 second so that I do not disclose actual partner performance data:
Partner 1: -O2 = 1sec, -O3 = 1.13sec, -O3 -mgeneral-regs-only = 0.72sec Partner 2: -O2 = 1sec, -O3 = 0.68sec, -O3 -mgeneral-regs-only = 0.60sec Partner 3: -O2 = 1sec, -O3 = 0.73sec, -O3 -mgeneral-regs-only = 0.68sec Partner 4: -O2 = 1sec, -O3 = 0.83sec, -O3 -mgeneral-regs-only = 0.84sec
So, in general, -O3 does actually do better than -O2, but in all cases performance is better if I stop it using FP registers for int values.
I have put a tarball of the test program along with 3 binaries and 3 disassemblies here:-
http://people.linaro.org/~edward.nevill/fibo.tar
All the best, Ed.