On Thu, Dec 1, 2011 at 12:20 AM, Ira Rosen ira.rosen@linaro.org wrote:
On 30 November 2011 02:33, Michael Hope michael.hope@linaro.org wrote:
I then converted the vld1 and vst1 to specifiy an alignment of 64 bits. See: http://people.linaro.org/~michaelh/incoming/set-alignment.png
This improved the throughput in all cases and in cases for more than 50 words by 14 %. This graph also shows the overhead of the runtime peeling check. The blue line is the vectoriser version which is slower to pick up due the greater per call overhead.
So, the auto-vectorized code doesn't have the alignment hints (peeling or not peeling), right? Is this how a hint is supposed to look like: vld1.i64 {d16-d17}, [r1 :"#_128"] , or am I looking for a wrong thing?
I had a look in the backend and the vld1/vst1 %A operand adds the alignment if known. It correctly adds [r1:64] if I feed in an array of int64s. The code checks based on MEM_ALIGN and MEM_SIZE of the operand: align = MEM_ALIGN (x) >> 3; memsize = INTVAL (MEM_SIZE (x));
Not sure why the backend generates a vldmia instead of a vld1 though.
-- Michael