Hi Matt,
Thanks for sharing the information.
On 9 October 2012 10:37, Jubi Taneja <jubitaneja@gmail.com> wrote:Try the following (tested against FSF GCC:
> Hi All,
>
> I wanted to see the difference in objdump of an application where I can make
> the difference between the VFPV3 and VFPV4 support. I tried enabling the
> flag -mfpu=vfpv3 and -mfpu=vfpv4 for ARM Cortex A15 toolchain in my test
> code but cannot see the difference in two objdumps.
/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o-
/tmp/fma.c -mfloat-abi=hard -O2 */
float f(float a, float b, float c)
{
return a * b + c;
}
/* end of tmp.c */
(Note that -mfloat-abi=softfp will also work in this example. Which
one you want to use depends on whether you have configured your system
for hard or soft-float ABIs).
> According to my survey, the fused multiply and accumulate is the onlyI would be surprised if you see much difference at all. VFPv3 has the
> instruction that can create the difference in two. Can any one provide the
> sample test code for the same? Precisely, I wish to see the difference in
> performance for vfpv3 and vfpv4.
VMLA (non-fused multiply-accumulate) instruction, which does an extra
rounding-step,
but I expect will have similar performance
characteristics to VFMA.
Note that between -mfpu=vfpv3 and -mfpu=vfpv4 there is also
-mfpu=vfpv3-fp16 which added support for loading and storing
half-precision floating-point values. Again this won't make a
performance difference unless you use half-precision as your storage
format.
Thanks,
Matt
--
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-dann@linaro.org