Vectoriser performance regression in 4.7

11 Jun 2012


      While benchmarking the auto-vectoriser on Libav, I noticed a performance
regression in gcc 4.7 (both FSF and Linaro) compared to gcc 4.6 in the AAC
decoder.  I narrowed it down to this function:
static void ps_hybrid_analysis_ileave_c(float (*out)[32][2],
                                        float L[2][38][64],
                                        int i, int len)
{
    int j;
for (; i < 64; i++) {
        for (j = 0; j < len; j++) {
            out[i][j][0] = L[0][j][i];
            out[i][j][1] = L[1][j][i];
        }
    }
}
While gcc 4.6 does not attempt to vectorise this at all, 4.7 goes crazy
with a massive slowdown, about 20x slower than non-vectorised with Linaro
4.7 and much worse with FSF 4.7.
Let me know if you need more information.
-- 
Mans Rullgard / mru

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Vectoriser performance regression in 4.7