On 23 May 2013 04:09, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Mans Rullgard mans.rullgard@linaro.org writes:
If you need to disable CSE for part of the code, you might want to try your luck with __attribute__((optimize("no-gcse"))) on the relevant functions.
I'd also be interested if you think this class of optimization makes little sense on ARM and then I'll stop and find something else to do :-)
I suggest running some benchmarks under perf and counting branch prediction misses. Maybe it's not as much of a problem as you think.
Well, I recompiled with -fno-gcse globally and the change now does result in a reasonable performance increase, in the 3-7 % range. perf stat suggests that this is because it reduces the overall number of branches rather than the rate of branch misses though...
Correctly predicted branches are more or less free, as are unconditional ones to a fixed target. If the number of mispredicted branches changes, that's interesting.
It would also be interesting to see the numbers if you apply the above-mentioned attribute to the relevant function(s) and leave CSE otherwise enabled. CSE is a generally a good thing, so disabling it globally is probably not that great an idea.
BTW, what hardware are you testing this on?