Re: Vectorised copy

7 Sep 2011


      On Wed, Sep 7, 2011 at 2:14 AM, Richard Sandiford
richard.sandiford@linaro.org wrote:
...
Michael Hope michael.hope@linaro.org writes:
...
While out benchmarking today, I ran across code similar to this:
int *a;
int *b;
int *c;
const int ad[320];
const int bd[320];
const int cd[320];
void fill()
{
  for (int i = 0; i < 320; i++)
    {
      a[i] = ad[i];
      b[i] = bd[i];
      c[i] = cd[i];
    }
}
I was surprised and happy to see the vectoriser kick in for the copy.
The inner loop looks like:
add     r5, r3, ip
      adds    r4, r3, r7
      vldmia  r2!, {d16-d17}
      vldmia  r1!, {d18-d19}
      adds    r0, r3, r6
      vst1.32 {q9}, [r5]
      vst1.32 {q8}, [r4]
      vldmia  r3, {d16-d17}
      adds    r3, r3, #16
      cmp     r3, r8
      vst1.32 {q8}, [r0]
      bne     .L3
so r3 is the loop variable and {ip,r7} are the offsets from r3 to the
destination pointers.  Adding a __restrict doesn't change the code.
FWIW, this comes from ivopts.  I raised the "problem" on gcc@
a few months back, but it seems to be intentional behaviour:
http://gcc.gnu.org/ml/gcc/2011-07/msg00050.html
That is, all things being equal, the current code tends to prefer
cases where it can hoist the difference between potential ivs
rather than creating separate ivs.
As far as the end of today's meeting goes: ivopts is one of those
things on my unwritten list of areas that it would be nice to look at.
I posted some benchmark comparing -fivopts with -fno-ivopts to the
benchmark list in July.  As expected, ivopts does help a lot cases,
but there were also a fair number of cases where turning it off
significantly improved performance.
Spawned into:
 https://blueprints.launchpad.net/gcc-linaro/+spec/investigate-ivopts
-- Michael

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Vectorised copy