On 30 November 2011 22:28, Michael Hope michael.hope@linaro.org wrote:
This run also showed the affect of loop unrolling. The loop seems to be unrolled for loops of <= 64 words and drops off in performance past around 8 words. When the unrolling finally drops out, performance increases by 101 %.
I see register spills starting from COUNT=36.
Ah. Does the vectoriser cost model take register pressure into account? How can I turn this on?
No, but the vectorizer doesn't perform loop unrolling either. The unrolling here is done by complete_unroll pass after the vectorization, and AFAIK it doesn't take register pressure into account.
On 1 December 2011 02:40, Michael Hope michael.hope@linaro.org wrote:
I had a look in the backend and the vld1/vst1 %A operand adds the alignment if known. It correctly adds [r1:64] if I feed in an array of int64s. The code checks based on MEM_ALIGN and MEM_SIZE of the operand: align = MEM_ALIGN (x) >> 3; memsize = INTVAL (MEM_SIZE (x));
Not sure why the backend generates a vldmia instead of a vld1 though.
I don't see how the alignment info set by the vectorizer influences MEM_ALIGN. The vectorizer sets align and misalign fields of struct ptr_info_def. I see it used in expand_expr_real_1 for MEM_REF only to decide if there is a need in movmisalign (for unaligned accesses).
MEM_ALIGN is determined in set_mem_attributes_minus_bitpos from DECL_ALIGN or TYPE_ALIGN. For the cases where the vectorizer forces alignment this should work, since we then set DECL_ALIGN (in vect_compute_data_ref_alignment). But peeling obviously doesn't change DECL_ALIGN, so I don't understand how we can create alignment hint in this case with the current code.
Ira
-- Michael