Hi Ramana. You were right about being able to do operations on intrinsic types. Instead of doing the admittedly made up:
int16x4_t foo2(int16x4_t a, int16x4_t b) { int16x4_t ca = vdup_n_s16(0.2126*256); int16x4_t cb = vdup_n_s16(0.7152*256);
return vadd_s16(vmul_s16(ca, a), vmul_s16(cb, b)); }
you can do:
int16x4_t foo3(int16x4_t a, int16x4_t b) { int16x4_t ca = vdup_n_s16(0.2126*256); int16x4_t cb = vdup_n_s16(0.7152*256);
return ca*a + cb*b; }
which is more readable and, as an added bonus, generates the multiply-and-accumulate that I missed when using intrinsics. Nice.
-- Michael