Hello,
I've had a look at the mp3player performance regressions (just with *some* data sets) with the vector-alignment patch. Interestingly it turns out that the patch basically does not change the generated code for the hot spot (inv_mdct routine) at all. (The *only* change is which bits of the incoming pointer the run-time alignment check generated by the vectorizer tests for. But this has no practical consequences, since the check itself is not hot, and the *decision* made by the check is the same anyway -- everything is in fact properly aligned at runtime.)
The other difference, outside of code, introduced by the vector-alignment patch is that some global arrays used to be forcibly aligned to 16 bytes by the vectorizer, and they are now only aligned to 8 bytes. To check whether this makes a difference, I've modified the compiler as a hack to always force all global arrays to be 16 byte aligned. And interestingly enough, this appears to fix this particular performance regression ...
Any thoughts as to why this might be the case? What are the recommendations on the ARM hardware side as to what alignment is prefered?
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294