Hi,

I started to look into mixed vector sizes (in the same loop). My main reason for this was to allow widening and narrowing instructions, that have different vector sizes for src and dest, to work properly. My example was widen_mult (int = short * short), I thought its implementation was not optimal. But now that I have a working GCC mainline for ARM, I see that it works just fine.

short ub[], uc[];
int c[];
for (i = 0; i < n; i++)
    c[i] = ub[i] * ua[i];

is compiled as:

.L11:
        add     r1, r1, #1
        vldmia  r4!, {d18-d19}
        cmp     r5, r1
        vldmia  ip!, {d16-d17}
        vmull.s16 q10, d18, d16
        vstr    d20, [r3, #-32]
        vstr    d21, [r3, #-24]
        vmull.s16 q8, d19, d17
        vstr    d16, [r3, #-16]
        vstr    d17, [r3, #-8]
        add     r3, r3, #32
        bhi     .L11

which looks good to me at least from the vmull point of view.
Does anyone have an example when mixed vector size instructions are not used properly?

Another reason for mixed sizes could be cases where only part of the loop can be vectorized with the wider vectors. I don't know how common this is.

Are there any other reasons to implement mixed vector sizes? I understand that this can be a useful feature, I am just not sure it's the most important one.

Thanks,
Ira