Re: Vectoriser performance regression in 4.7

11 Jun 2012


      On 11 June 2012 17:34, Ulrich Weigand Ulrich.Weigand@de.ibm.com wrote:
...
Mans Rullgard mans.rullgard@linaro.org wrote:
...
static void ps_hybrid_analysis_ileave_c(float (*out)[32][2],
                                        float L[2][38][64],
                                        int i, int len)
{
    int j;
for (; i < 64; i++) {
        for (j = 0; j < len; j++) {
            out[i][j][0] = L[0][j][i];
            out[i][j][1] = L[1][j][i];
        }
    }
}
While gcc 4.6 does not attempt to vectorise this at all, 4.7 goes crazy
with a massive slowdown, about 20x slower than non-vectorised with Linaro
4.7 and much worse with FSF 4.7.
Let me know if you need more information.
Thanks for the report; I can reproduce the problem.
There's a number of issues with how GCC choses the vectorize this loop
that we can potentially improve upon.  However, it would appear that no
matter what, it probably isn't actually helpful to try to vectorize this
loop in the first place.
It could be beneficial to merge the stores into a single 64-bit store.
In this particular case, it is actually 64-bit aligned, although there's
no way for gcc to know this.
...
Fortunately, the vectorizer cost model clearly recognizes this fact (and
will classify this loop as "not vectorized: vector version will never be
profitable").
Unfortunately, it seems that on ARM, the cost model is actually off by
default (it is enabled by default only on i386).
We'll have to enable the cost model on ARM by default as well (and
probably tune it a bit to avoid regresssions on other benchmarks).
However for now, I'd recommend you use -fvect-cost-model when testing
the vectorizer on libav.
I'll add that flag and see what happens.  Any other flags I should be using?
-- 
Mans Rullgard / mru

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Vectoriser performance regression in 4.7