On 9 October 2012 10:37, Jubi Taneja jubitaneja@gmail.com wrote:
Hi All,
I wanted to see the difference in objdump of an application where I can make the difference between the VFPV3 and VFPV4 support. I tried enabling the flag -mfpu=vfpv3 and -mfpu=vfpv4 for ARM Cortex A15 toolchain in my test code but cannot see the difference in two objdumps.
Try the following (tested against FSF GCC:
/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 */ float f(float a, float b, float c) { return a * b + c; } /* end of tmp.c */
(Note that -mfloat-abi=softfp will also work in this example. Which one you want to use depends on whether you have configured your system for hard or soft-float ABIs).
According to my survey, the fused multiply and accumulate is the only instruction that can create the difference in two. Can any one provide the sample test code for the same? Precisely, I wish to see the difference in performance for vfpv3 and vfpv4.
I would be surprised if you see much difference at all. VFPv3 has the VMLA (non-fused multiply-accumulate) instruction, which does an extra rounding-step, but I expect will have similar performance characteristics to VFMA.
Note that between -mfpu=vfpv3 and -mfpu=vfpv4 there is also -mfpu=vfpv3-fp16 which added support for loading and storing half-precision floating-point values. Again this won't make a performance difference unless you use half-precision as your storage format.
Thanks,
Matt