Mans Rullgard mans.rullgard@linaro.org wrote:
static void ps_hybrid_analysis_ileave_c(float (*out)[32][2], float L[2][38][64], int i, int len) { int j;
for (; i < 64; i++) { for (j = 0; j < len; j++) { out[i][j][0] = L[0][j][i]; out[i][j][1] = L[1][j][i]; } }
}
While gcc 4.6 does not attempt to vectorise this at all, 4.7 goes crazy with a massive slowdown, about 20x slower than non-vectorised with Linaro 4.7 and much worse with FSF 4.7.
Let me know if you need more information.
Thanks for the report; I can reproduce the problem.
There's a number of issues with how GCC choses the vectorize this loop that we can potentially improve upon. However, it would appear that no matter what, it probably isn't actually helpful to try to vectorize this loop in the first place.
Fortunately, the vectorizer cost model clearly recognizes this fact (and will classify this loop as "not vectorized: vector version will never be profitable").
Unfortunately, it seems that on ARM, the cost model is actually off by default (it is enabled by default only on i386).
We'll have to enable the cost model on ARM by default as well (and probably tune it a bit to avoid regresssions on other benchmarks).
However for now, I'd recommend you use -fvect-cost-model when testing the vectorizer on libav.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294