Re: Effect of alignment and peeling on vectorised loops

30 Nov 2011


      On Thu, Dec 1, 2011 at 12:20 AM, Ira Rosen ira.rosen@linaro.org wrote:
...
On 30 November 2011 02:33, Michael Hope michael.hope@linaro.org wrote:
...
I then converted the vld1 and vst1 to specifiy an alignment of 64
bits. See:
 http://people.linaro.org/~michaelh/incoming/set-alignment.png
This improved the throughput in all cases and in cases for more than 50
words by 14 %.  This graph also shows the overhead of the runtime
peeling check.  The blue line is the vectoriser version which is
slower to pick up due the greater per call overhead.
So, the auto-vectorized code doesn't have the alignment hints (peeling
or not peeling), right? Is this how a hint is supposed to look like:
vld1.i64 {d16-d17}, [r1 :"#_128"] , or am I looking for a wrong thing?
Yip.  We currently use a vldmia r1!, {d16-d17} which (on the A9 at
least) only works for aligned values and takes the same time as the
unaligned-friendly vld1.i64 {d16-d17}, [r1]!
...
I thought that peeling should be useful at least for the hints.
Peeling and using the vld1.i64 {d16-d17}, [r1:64]! form should be
faster for larger loops.  For some reason vld1.i64 ..., [r1:128] gives
an illegal instruction trap on my board.  Note that the :128 is in
bits.
...
...
I then went back to the vectoriser and changed the alignment of the
struct to cause peeling to turn on and off.  See:
 http://people.linaro.org/~michaelh/incoming/unroll.png
At 200 words, the version without peeling is 2.9 % faster.  This is
partly due to a fixed count loop turning into a runtime count due to
unknown alignment.
This run also showed the affect of loop unrolling.  The loop seems to
be unrolled for loops of <= 64 words and drops off in performance past
around 8 words.  When the unrolling finally drops out, performance
increases by 101 %.
I see register spills starting from COUNT=36.
Ah.  Does the vectoriser cost model take register pressure into
account?  How can I turn this on?
-- Michael

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Effect of alignment and peeling on vectorised loops