Hi,
I started to look into mixed vector sizes (in the same loop). My main reason for this was to allow widening and narrowing instructions, that have different vector sizes for src and dest, to work properly. My example was widen_mult (int = short * short), I thought its implementation was not optimal. But now that I have a working GCC mainline for ARM, I see that it works just fine.
short ub[], uc[]; int c[]; for (i = 0; i < n; i++) c[i] = ub[i] * ua[i];
is compiled as:
.L11: add r1, r1, #1 vldmia r4!, {d18-d19} cmp r5, r1 vldmia ip!, {d16-d17} vmull.s16 q10, d18, d16 vstr d20, [r3, #-32] vstr d21, [r3, #-24] vmull.s16 q8, d19, d17 vstr d16, [r3, #-16] vstr d17, [r3, #-8] add r3, r3, #32 bhi .L11
which looks good to me at least from the vmull point of view. Does anyone have an example when mixed vector size instructions are not used properly?
Another reason for mixed sizes could be cases where only part of the loop can be vectorized with the wider vectors. I don't know how common this is.
Are there any other reasons to implement mixed vector sizes? I understand that this can be a useful feature, I am just not sure it's the most important one.
Thanks, Ira