Vector-alignment patch performance regressions - linaro-toolchain

8 Aug 2012


      Hello,
I've had a look at the mp3player performance regressions (just with *some*
data sets) with the vector-alignment patch.  Interestingly it turns out
that the patch basically does not change the generated code for the hot
spot (inv_mdct routine) at all.  (The *only* change is which bits of the
incoming pointer the run-time alignment check generated by the vectorizer
tests for.  But this has no practical consequences, since the check itself
is not hot, and the *decision* made by the check is the same anyway --
everything is in fact properly aligned at runtime.)
The other difference, outside of code, introduced by the vector-alignment
patch is that some global arrays used to be forcibly aligned to 16 bytes by
the vectorizer, and they are now only aligned to 8 bytes.  To check whether
this makes a difference, I've modified the compiler as a hack to always
force all global arrays to be 16 byte aligned.   And interestingly enough,
this appears to fix this particular performance regression ...
Any thoughts as to why this might be the case?  What are the
recommendations on the ARM hardware side as to what alignment is prefered?
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
  Dr. Ulrich Weigand | Phone: +49-7031/16-3727
  STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
  IBM Deutschland Research & Development GmbH
  Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
  Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294