Function alignment and benchmark results - linaro-toolchain

23 Aug 2012


      Zhenqiang's been working on the later split 2 patch which causes more
constants to be built using a movw/movt instead of a constant pool
load.  There was an unexpected ~10 % regression in one benchmark which
seems to be due to function alignment.  I think we've tracked down the
reason but not the action.
Compared to the baseline, the split2 branch took 113 % of the time to
run, i.e. 13 % longer.  Adding an explicit 16 byte alignment to the
function changed this to 97 % of the time, i.e. 3 % faster.  The
reason Zhenqiang and I got different results was the build-id.  He
used the binary build scripts to make the cross compiler, which turn
on the build ID, which added an extra 20 bytes ahead of .text, which
happened to align the function to 16 bytes.  cbuild doesn't use the
build-id (although it should) which happened to align the function to
an 8 byte boundary.
The disassembly is identical so I assume the regression is cache or
fast loop related.  I'm not sure what to do, so let's talk about this
at the next performance call.
-- Michael