On 06/01/12 02:17, Michael Hope wrote:
Hi Ramana. You were right about being able to do operations on intrinsic types. Instead of doing the admittedly made up:
int16x4_t foo2(int16x4_t a, int16x4_t b) { int16x4_t ca = vdup_n_s16(0.2126*256); int16x4_t cb = vdup_n_s16(0.7152*256);
return vadd_s16(vmul_s16(ca, a), vmul_s16(cb, b)); }
you can do:
int16x4_t foo3(int16x4_t a, int16x4_t b) { int16x4_t ca = vdup_n_s16(0.2126*256); int16x4_t cb = vdup_n_s16(0.7152*256);
return ca*a + cb*b; }
which is more readable and, as an added bonus, generates the multiply-and-accumulate that I missed when using intrinsics. Nice.
This is a GCC extension. It's not portable, and in particular it's not supported by ARM's own compiler.
There are also difficulties if you start doing operations directly when it comes to dealing with big-endian as there is a degree of divergence between GCC's own interpretation of vectors and the intrinsic view; mixing and matching can lead to subtle problems with lane numbering.
-- Richard Earnshaw Email: Richard.Earnshaw@arm.com Engineering Manager Phone: +44 1223 400569 (Direct + VoiceMail) OpenSource Tools Switchboard: +44 1223 400400 ARM Ltd Fax: +44 1223 400410 110 Fulbourn Rd Web: http://www.arm.com/ Cambridge, UK. CB1 9NJ
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.