Re: help fighting with optimizations

23 May 2013


      Mans Rullgard mans.rullgard@linaro.org writes:
...
On 22 May 2013 05:13, Michael Hudson-Doyle michael.hudson@canonical.com wrote:
...
Hi all,
I've spent a little while porting an optimization from Python 3 to
Python 2.7 (http://bugs.python.org/issue4753).  The idea of the patch is
to improve performance by dispatching opcodes on computed labels rather
than a big switch -- and so confusing the branch predictor less.
The problem with this is that the last bit of code for each opcode ends
up being the same, so common subexpression elimination wants to coalesce
all these bits, which neatly and completely nullifies the point of the
optimization.
The branches added by this would be unconditional and should thus not
add any load on the branch predictor.
...
Playing around just building from source directly, it
seems that -fno-gcse prevents gcc from doing this, and the resulting
interpreter shows a small performance improvement over a build that does
not include the patch.
However, when I build a debian package containing the patch, I see no
improvement at all.  My theory, and I'd like you guys to tell me if this
makes sense, is that this is because the Debian package uses link time
optimization, and so even though I carefully compile ceval.c with
-fno-gcse, the common subexpression elimination happens anyway at link
time.  I've tried staring at disassembly to confirm or deny this but I
don't know ARM assembly very well and the compiled function is roughtly
10k instructions long so I didn't get very far with this (I can supply
the disassembly if someone wants to see it!).
Is there some way I can tell GCC to not compile perform CSE on a section
of code?  I guess I can make sure that the whole program, linker step
and all, is compiled with -fno-gcse but that seems a bit of a blunt
hammer.
When using LTO, most of the optimisations happen, as the name implies,
during linking.  The optimisation flags provided there, whether explicit
or default, are used for everything.
OK.  I wasn't sure initially whether the optimizations that were
performed at link time were the same as the ones that are traditionally
performed at compile time, but reading the docs again makes it clear
(ish) that they are.
...
If you need to disable CSE for part of the code, you might want to try
your luck with __attribute__((optimize("no-gcse"))) on the relevant
functions.
...
I'd also be interested if you think this class of optimization makes
little sense on ARM and then I'll stop and find something else to do :-)
I suggest running some benchmarks under perf and counting branch prediction
misses.  Maybe it's not as much of a problem as you think.
Well, I recompiled with -fno-gcse globally and the change now does
result in a reasonable performance increase, in the 3-7 % range.  perf
stat suggests that this is because it reduces the overall number of
branches rather than the rate of branch misses though...
Cheers,
mwh

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: help fighting with optimizations