Re: Vectoriser performance regression in 4.7

11 Jun 2012


      Mans Rullgard mans.rullgard@linaro.org wrote:
...
static void ps_hybrid_analysis_ileave_c(float (*out)[32][2],
                                        float L[2][38][64],
                                        int i, int len)
{
    int j;
for (; i < 64; i++) {
    for (j = 0; j < len; j++) {
        out[i][j][0] = L[0][j][i];
        out[i][j][1] = L[1][j][i];
    }
}

}
While gcc 4.6 does not attempt to vectorise this at all, 4.7 goes crazy
with a massive slowdown, about 20x slower than non-vectorised with Linaro
4.7 and much worse with FSF 4.7.
Let me know if you need more information.
Thanks for the report; I can reproduce the problem.
There's a number of issues with how GCC choses the vectorize this loop
that we can potentially improve upon.  However, it would appear that no
matter what, it probably isn't actually helpful to try to vectorize this
loop in the first place.
Fortunately, the vectorizer cost model clearly recognizes this fact (and
will classify this loop as "not vectorized: vector version will never be
profitable").
Unfortunately, it seems that on ARM, the cost model is actually off by
default (it is enabled by default only on i386).
We'll have to enable the cost model on ARM by default as well (and
probably tune it a bit to avoid regresssions on other benchmarks).
However for now, I'd recommend you use -fvect-cost-model when testing
the vectorizer on libav.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
  Dr. Ulrich Weigand | Phone: +49-7031/16-3727
  STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
  IBM Deutschland Research & Development GmbH
  Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk
Wittkopp
  Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Vectoriser performance regression in 4.7