Re: Effect of alignment and peeling on vectorised loops

1 Dec 2011


      On Thu, Dec 1, 2011 at 12:20 AM, Ira Rosen ira.rosen@linaro.org wrote:
...
On 30 November 2011 02:33, Michael Hope michael.hope@linaro.org wrote:
...
I then converted the vld1 and vst1 to specifiy an alignment of 64
bits. See:
 http://people.linaro.org/~michaelh/incoming/set-alignment.png
This improved the throughput in all cases and in cases for more than 50
words by 14 %.  This graph also shows the overhead of the runtime
peeling check.  The blue line is the vectoriser version which is
slower to pick up due the greater per call overhead.
So, the auto-vectorized code doesn't have the alignment hints (peeling
or not peeling), right? Is this how a hint is supposed to look like:
vld1.i64 {d16-d17}, [r1 :"#_128"] , or am I looking for a wrong thing?
I had a look in the backend and the vld1/vst1 %A operand adds the
alignment if known.  It correctly adds [r1:64] if I feed in an array
of int64s.  The code checks based on MEM_ALIGN and MEM_SIZE of the
operand:
    align = MEM_ALIGN (x) >> 3;
    memsize = INTVAL (MEM_SIZE (x));
Not sure why the backend generates a vldmia instead of a vld1 though.
-- Michael

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Effect of alignment and peeling on vectorised loops