Test Case to verify the support of VFPV3 and VFPV4

List overview All Threads
Download

newer

older

[ANNOUNCE] Linaro Toolchain...

[Activity] Week 42

Jubi Taneja

9 Oct 2012 9 Oct '12

9:37 a.m.

Hi All,

I wanted to see the difference in objdump of an application where I can make the difference between the VFPV3 and VFPV4 support. I tried enabling the flag -mfpu=vfpv3 and -mfpu=vfpv4 for ARM Cortex A15 toolchain in my test code but cannot see the difference in two objdumps.

According to my survey, the fused multiply and accumulate is the only instruction that can create the difference in two. Can any one provide the sample test code for the same? Precisely, I wish to see the difference in performance for vfpv3 and vfpv4.

Looking forward to your reply.

Thanks and Regards, Jubi

Attachments:

attachment.html (text/html — 632 bytes)

Show replies by date

Matthew Gretton-Dann

9 Oct 9 Oct

10:21 a.m.

On 9 October 2012 10:37, Jubi Taneja jubitaneja@gmail.com wrote:

...

Hi All,

I wanted to see the difference in objdump of an application where I can make the difference between the VFPV3 and VFPV4 support. I tried enabling the flag -mfpu=vfpv3 and -mfpu=vfpv4 for ARM Cortex A15 toolchain in my test code but cannot see the difference in two objdumps.

Try the following (tested against FSF GCC:

/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 */ float f(float a, float b, float c) { return a * b + c; } /* end of tmp.c */

(Note that -mfloat-abi=softfp will also work in this example. Which one you want to use depends on whether you have configured your system for hard or soft-float ABIs).

...

According to my survey, the fused multiply and accumulate is the only instruction that can create the difference in two. Can any one provide the sample test code for the same? Precisely, I wish to see the difference in performance for vfpv3 and vfpv4.

I would be surprised if you see much difference at all. VFPv3 has the VMLA (non-fused multiply-accumulate) instruction, which does an extra rounding-step, but I expect will have similar performance characteristics to VFMA.

Note that between -mfpu=vfpv3 and -mfpu=vfpv4 there is also -mfpu=vfpv3-fp16 which added support for loading and storing half-precision floating-point values. Again this won't make a performance difference unless you use half-precision as your storage format.

Thanks,

Matt

-- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-dann@linaro.org

Jubi Taneja

10:58 a.m.

Hi Matt,

Thanks for sharing the information.

On Tue, Oct 9, 2012 at 3:51 PM, Matthew Gretton-Dann < matthew.gretton-dann@linaro.org> wrote:

...

On 9 October 2012 10:37, Jubi Taneja jubitaneja@gmail.com wrote:

...
Hi All,

I wanted to see the difference in objdump of an application where I can

make

...
the difference between the VFPV3 and VFPV4 support. I tried enabling the flag -mfpu=vfpv3 and -mfpu=vfpv4 for ARM Cortex A15 toolchain in my test code but cannot see the difference in two objdumps.

Try the following (tested against FSF GCC:

/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 */ float f(float a, float b, float c) { return a * b + c; } /* end of tmp.c */

(Note that -mfloat-abi=softfp will also work in this example. Which one you want to use depends on whether you have configured your system for hard or soft-float ABIs).

I checked both with -mfpu=vfpv3 and -mfpu=vfpv4 and it generates the same

assembly code. VMLA insn is emitted for both the cases. I was wondering if I can get any test case so that I may observe the difference in the two objdumps.

...

...
According to my survey, the fused multiply and accumulate is the only instruction that can create the difference in two. Can any one provide

the

...
sample test code for the same? Precisely, I wish to see the difference in performance for vfpv3 and vfpv4.

I would be surprised if you see much difference at all. VFPv3 has the VMLA (non-fused multiply-accumulate) instruction, which does an extra rounding-step,

Correct, I checked this.

...

but I expect will have similar performance characteristics to VFMA.

Yes, since the assembly code are similar and they cannot make any performance difference as of now.

...

Note that between -mfpu=vfpv3 and -mfpu=vfpv4 there is also -mfpu=vfpv3-fp16 which added support for loading and storing half-precision floating-point values. Again this won't make a performance difference unless you use half-precision as your storage format.

I need to check this once.

Thanks, Jubi

...

Thanks,

Matt

-- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-dann@linaro.org

Matthew Gretton-Dann

11:51 a.m.

On 9 October 2012 11:58, Jubi Taneja jubitaneja@gmail.com wrote:

...

Hi Matt,

Thanks for sharing the information.

On Tue, Oct 9, 2012 at 3:51 PM, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...
On 9 October 2012 10:37, Jubi Taneja jubitaneja@gmail.com wrote:

...
Hi All,

I wanted to see the difference in objdump of an application where I can make the difference between the VFPV3 and VFPV4 support. I tried enabling the flag -mfpu=vfpv3 and -mfpu=vfpv4 for ARM Cortex A15 toolchain in my test code but cannot see the difference in two objdumps.

Try the following (tested against FSF GCC:

/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 */ float f(float a, float b, float c) { return a * b + c; } /* end of tmp.c */

(Note that -mfloat-abi=softfp will also work in this example. Which one you want to use depends on whether you have configured your system for hard or soft-float ABIs).

I checked both with -mfpu=vfpv3 and -mfpu=vfpv4 and it generates the same assembly code. VMLA insn is emitted for both the cases. I was wondering if I can get any test case so that I may observe the difference in the two objdumps.

Which compiler are you using? VFMA support is only in trunk FSF GCC. Linaro has not yet backported support to 4.7.

Thanks,

Matt

-- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-dann@linaro.org

Jubi Taneja

1:44 p.m.

On Tue, Oct 9, 2012 at 5:21 PM, Matthew Gretton-Dann < matthew.gretton-dann@linaro.org> wrote:

...

On 9 October 2012 11:58, Jubi Taneja jubitaneja@gmail.com wrote:

...
Hi Matt,

Thanks for sharing the information.

On Tue, Oct 9, 2012 at 3:51 PM, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...
On 9 October 2012 10:37, Jubi Taneja jubitaneja@gmail.com wrote:

...
Hi All,

I wanted to see the difference in objdump of an application where I

can

...
...
...
make the difference between the VFPV3 and VFPV4 support. I tried enabling

the

...
...
...
flag -mfpu=vfpv3 and -mfpu=vfpv4 for ARM Cortex A15 toolchain in my

test

...
...
...
code but cannot see the difference in two objdumps.

Try the following (tested against FSF GCC:

/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 */ float f(float a, float b, float c) { return a * b + c; } /* end of tmp.c */

(Note that -mfloat-abi=softfp will also work in this example. Which one you want to use depends on whether you have configured your system for hard or soft-float ABIs).

I checked both with -mfpu=vfpv3 and -mfpu=vfpv4 and it generates the same assembly code. VMLA insn is emitted for both the cases. I was wondering

if I

...
can get any test case so that I may observe the difference in the two objdumps.

Which compiler are you using? VFMA support is only in trunk FSF GCC. Linaro has not yet backported support to 4.7.

I am using FSF GCC only.

...

Thanks,

Matt

-- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-dann@linaro.org

Matthew Gretton-Dann

6:07 p.m.

On 9 October 2012 14:44, Jubi Taneja jubitaneja@gmail.com wrote:

...

On Tue, Oct 9, 2012 at 5:21 PM, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...
...
...
/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 */ float f(float a, float b, float c) { return a * b + c; } /* end of tmp.c */

(Note that -mfloat-abi=softfp will also work in this example. Which one you want to use depends on whether you have configured your system for hard or soft-float ABIs).

I checked both with -mfpu=vfpv3 and -mfpu=vfpv4 and it generates the same assembly code. VMLA insn is emitted for both the cases. I was wondering if I can get any test case so that I may observe the difference in the two objdumps.

Which compiler are you using? VFMA support is only in trunk FSF GCC. Linaro has not yet backported support to 4.7.

I am using FSF GCC only.

What version of GCC (what does arm-none-linux-gneabi-gcc -v report?). When I compile the test case above with a recent (within last month or so) trunk GCC I get the following output which uses vfma:

$ /work/builds/gcc-fsf-arm-none-linux-gnueabi/tools/bin/arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 .cpu cortex-a15 .eabi_attribute 27, 3 .eabi_attribute 28, 1 .fpu vfpv4 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .file "fma.c" .text .align 2 .global f .type f, %function f: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vfma.f32 s2, s0, s1 fcpys s0, s2 bx lr .size f, .-f .ident "GCC: (GNU) 4.8.0 20120913 (experimental)" .section .note.GNU-stack,"",%progbits

-- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-dann@linaro.org

Jubi Taneja

20 Oct 20 Oct

8:41 a.m.

Hi Mathew

On Tue, Oct 9, 2012 at 11:37 PM, Matthew Gretton-Dann < matthew.gretton-dann@linaro.org> wrote:

...

On 9 October 2012 14:44, Jubi Taneja jubitaneja@gmail.com wrote:

...
On Tue, Oct 9, 2012 at 5:21 PM, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...
...
...
/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 */ float f(float a, float b, float c) { return a * b + c; } /* end of tmp.c */

(Note that -mfloat-abi=softfp will also work in this example. Which one you want to use depends on whether you have configured your

system

...
...
...
...
for hard or soft-float ABIs).

I checked both with -mfpu=vfpv3 and -mfpu=vfpv4 and it generates the same assembly code. VMLA insn is emitted for both the cases. I was

wondering

...
...
...
if I can get any test case so that I may observe the difference in the two objdumps.

Which compiler are you using? VFMA support is only in trunk FSF GCC. Linaro has not yet backported support to 4.7.

I am using FSF GCC only.

What version of GCC (what does arm-none-linux-gneabi-gcc -v report?).

# arm-none-linux-gneabi-gcc -v Using built-in specs. COLLECT_GCC=arm-none-linux-gneabi-gcc COLLECT_LTO_WRAPPER=/opt/toolchains/arm/bin/../libexec/gcc/arm-none-linux-gneabi/4.6.3/lto-wrapper Target: arm-none-linux-gneabi Configured with: /home/user/arm-src/build/sources/gcc_1/configure --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=arm-none-linux-gneabi --prefix=/opt/arm --with-sysroot=/opt/arm/arm-none-linux-gneabi/sys-root --disable-libmudflap --disable-libssp --disable-libgomp --disable-nls --disable-libstdcxx-pch --with-interwork --with-mode=arm --with-fpu=vfp3 --with-cpu=cortex-a9 --with-tune=cortex-a9 --with-float=softfp --enable-extra-vd-multilibs --enable-poison-system-directories --enable-long-long --enable-threads --enable-languages=c,c++ --enable-shared --enable-lto --enable-symvers=gnu --enable-__cxa_atexit --with-pkgversion=arm-toolchain.v1 --with-gnu-as --with-gnu-ld --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-build-time-tools=/opt/arm/bin --with-gmp=/opt/arm --with-mpfr=/opt/arm --with-ppl=/opt/arm --with-cloog=/opt/arm --with-libelf=/opt/arm Thread model: posix gcc version 4.6.3 (arm-toolchain.v1)

When I compile the test case above with a recent (within last month or

...

so) trunk GCC I get the following output which uses vfma:

$ /work/builds/gcc-fsf-arm-none-linux-gnueabi/tools/bin/arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 .cpu cortex-a15 .eabi_attribute 27, 3 .eabi_attribute 28, 1 .fpu vfpv4 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .file "fma.c" .text .align 2 .global f .type f, %function f: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vfma.f32 s2, s0, s1 fcpys s0, s2 bx lr .size f, .-f .ident "GCC: (GNU) 4.8.0 20120913 (experimental)" .section .note.GNU-stack,"",%progbits

--

$ arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- prog.c -O2 .cpu cortex-a15 .eabi_attribute 27, 3 .fpu vfpv4 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 0 .eabi_attribute 18, 4 .file "prog.c" .section .text.f,"ax",%progbits .align 2 .global f .type f, %function f: .fnstart @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. fmsr s14, r0 fmsr s13, r2 fmsr s15, r1 fmacs s13, s14, s15 fmrs r0, s13 bx lr .fnend .size f, .-f .section .text.startup.main,"ax",%progbits .align 2 .global main .type main, %function main: .fnstart @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. bx lr .fnend .size main, .-main .ident "GCC: (VDLinux.GA1.2012-10-03) 4.6.4" .section .note.GNU-stack,"",%progbits

I could not conclude the difference in two results and the overall conclusion for my query... Can you please guide to dig deeper in it?

Jubi

...

Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-dann@linaro.org

Matthew Gretton-Dann

22 Oct 22 Oct

9:12 a.m.

On 20 October 2012 09:41, Jubi Taneja jubitaneja@gmail.com wrote:

...

Hi Mathew

On Tue, Oct 9, 2012 at 11:37 PM, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...
On 9 October 2012 14:44, Jubi Taneja jubitaneja@gmail.com wrote:

...
On Tue, Oct 9, 2012 at 5:21 PM, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...
...
...
/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 */ float f(float a, float b, float c) { return a * b + c; } /* end of tmp.c */

(Note that -mfloat-abi=softfp will also work in this example. Which one you want to use depends on whether you have configured your system for hard or soft-float ABIs).

I checked both with -mfpu=vfpv3 and -mfpu=vfpv4 and it generates the same assembly code. VMLA insn is emitted for both the cases. I was wondering if I can get any test case so that I may observe the difference in the two objdumps.

Which compiler are you using? VFMA support is only in trunk FSF GCC. Linaro has not yet backported support to 4.7.

I am using FSF GCC only.

What version of GCC (what does arm-none-linux-gneabi-gcc -v report?).

# arm-none-linux-gneabi-gcc -v Using built-in specs. COLLECT_GCC=arm-none-linux-gneabi-gcc COLLECT_LTO_WRAPPER=/opt/toolchains/arm/bin/../libexec/gcc/arm-none-linux-gneabi/4.6.3/lto-wrapper Target: arm-none-linux-gneabi Configured with: /home/user/arm-src/build/sources/gcc_1/configure --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=arm-none-linux-gneabi --prefix=/opt/arm --with-sysroot=/opt/arm/arm-none-linux-gneabi/sys-root --disable-libmudflap --disable-libssp --disable-libgomp --disable-nls --disable-libstdcxx-pch --with-interwork --with-mode=arm --with-fpu=vfp3 --with-cpu=cortex-a9 --with-tune=cortex-a9 --with-float=softfp --enable-extra-vd-multilibs --enable-poison-system-directories --enable-long-long --enable-threads --enable-languages=c,c++ --enable-shared --enable-lto --enable-symvers=gnu --enable-__cxa_atexit --with-pkgversion=arm-toolchain.v1 --with-gnu-as --with-gnu-ld --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-build-time-tools=/opt/arm/bin --with-gmp=/opt/arm --with-mpfr=/opt/arm --with-ppl=/opt/arm --with-cloog=/opt/arm --with-libelf=/opt/arm Thread model: posix gcc version 4.6.3 (arm-toolchain.v1)

This is gcc 4.6.3 not current trunk (which would report gcc version 4.8.0).

GCC 4.6.3 does not support VFMA.

Thanks,

Matt

-- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-dann@linaro.org

Jubi Taneja

3:26 p.m.

Thanks Mathew.

Jubi

On Mon, Oct 22, 2012 at 2:42 PM, Matthew Gretton-Dann < matthew.gretton-dann@linaro.org> wrote:

...

On 20 October 2012 09:41, Jubi Taneja jubitaneja@gmail.com wrote:

...
Hi Mathew

On Tue, Oct 9, 2012 at 11:37 PM, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...
On 9 October 2012 14:44, Jubi Taneja jubitaneja@gmail.com wrote:

...
On Tue, Oct 9, 2012 at 5:21 PM, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...
...
> /* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- > /tmp/fma.c -mfloat-abi=hard -O2 */ > float f(float a, float b, float c) > { > return a * b + c; > } > /* end of tmp.c */ > > (Note that -mfloat-abi=softfp will also work in this example.

Which

...
...
...
...
...
> one you want to use depends on whether you have configured your > system > for hard or soft-float ABIs). > I checked both with -mfpu=vfpv3 and -mfpu=vfpv4 and it generates

the

...
...
...
...
...
same assembly code. VMLA insn is emitted for both the cases. I was wondering if I can get any test case so that I may observe the difference in the

two

...
...
...
...
...
objdumps.

Which compiler are you using? VFMA support is only in trunk FSF GCC. Linaro has not yet backported support to 4.7.

I am using FSF GCC only.

What version of GCC (what does arm-none-linux-gneabi-gcc -v report?).

# arm-none-linux-gneabi-gcc -v Using built-in specs. COLLECT_GCC=arm-none-linux-gneabi-gcc

COLLECT_LTO_WRAPPER=/opt/toolchains/arm/bin/../libexec/gcc/arm-none-linux-gneabi/4.6.3/lto-wrapper

...
Target: arm-none-linux-gneabi Configured with: /home/user/arm-src/build/sources/gcc_1/configure --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=arm-none-linux-gneabi --prefix=/opt/arm --with-sysroot=/opt/arm/arm-none-linux-gneabi/sys-root

--disable-libmudflap

...
--disable-libssp --disable-libgomp --disable-nls --disable-libstdcxx-pch --with-interwork --with-mode=arm --with-fpu=vfp3 --with-cpu=cortex-a9 --with-tune=cortex-a9 --with-float=softfp --enable-extra-vd-multilibs --enable-poison-system-directories --enable-long-long --enable-threads --enable-languages=c,c++ --enable-shared --enable-lto

--enable-symvers=gnu

...
--enable-__cxa_atexit --with-pkgversion=arm-toolchain.v1 --with-gnu-as --with-gnu-ld --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-build-time-tools=/opt/arm/bin --with-gmp=/opt/arm --with-mpfr=/opt/arm --with-ppl=/opt/arm --with-cloog=/opt/arm --with-libelf=/opt/arm Thread model: posix gcc version 4.6.3 (arm-toolchain.v1)

This is gcc 4.6.3 not current trunk (which would report gcc version 4.8.0).

GCC 4.6.3 does not support VFMA.

Thanks,

Matt

-- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-dann@linaro.org

Peter Maydell

9 Oct 9 Oct

11:09 a.m.

On 9 October 2012 11:21, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...

On 9 October 2012 10:37, Jubi Taneja jubitaneja@gmail.com wrote:

...
I wanted to see the difference in objdump of an application where I can make the difference between the VFPV3 and VFPV4 support. I tried enabling the flag -mfpu=vfpv3 and -mfpu=vfpv4 for ARM Cortex A15 toolchain in my test code but cannot see the difference in two objdumps.

Try the following (tested against FSF GCC:

/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 */ float f(float a, float b, float c) { return a * b + c; } /* end of tmp.c */

I would have expected that you would need a gcc option to tell it that non-IEEE floating point results are OK. Otherwise it's not valid to emit a fused multiply-add for a*b+c because IEEE specifies that you should get a rounding step between the multiply and the add. Or does gcc default to non-IEEE arithmetic?

-- PMM

Mans Rullgard

11:46 a.m.

On 9 October 2012 12:09, Peter Maydell peter.maydell@linaro.org wrote:

...

On 9 October 2012 11:21, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...
On 9 October 2012 10:37, Jubi Taneja jubitaneja@gmail.com wrote:

...
I wanted to see the difference in objdump of an application where I can make the difference between the VFPV3 and VFPV4 support. I tried enabling the flag -mfpu=vfpv3 and -mfpu=vfpv4 for ARM Cortex A15 toolchain in my test code but cannot see the difference in two objdumps.

Try the following (tested against FSF GCC:

/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 */ float f(float a, float b, float c) { return a * b + c; } /* end of tmp.c */

I would have expected that you would need a gcc option to tell it that non-IEEE floating point results are OK. Otherwise it's not valid to emit a fused multiply-add for a*b+c because IEEE specifies that you should get a rounding step between the multiply and the add. Or does gcc default to non-IEEE arithmetic?

Maybe adding -ffast-math does something.

-- Mans Rullgard / mru

Jubi Taneja

1:43 p.m.

On Tue, Oct 9, 2012 at 5:16 PM, Mans Rullgard mans.rullgard@linaro.orgwrote:

...

On 9 October 2012 12:09, Peter Maydell peter.maydell@linaro.org wrote:

...
On 9 October 2012 11:21, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...
On 9 October 2012 10:37, Jubi Taneja jubitaneja@gmail.com wrote:

...
I wanted to see the difference in objdump of an application where I

can make

...
...
...
the difference between the VFPV3 and VFPV4 support. I tried enabling

the

...
...
...
flag -mfpu=vfpv3 and -mfpu=vfpv4 for ARM Cortex A15 toolchain in my

test

...
...
...
code but cannot see the difference in two objdumps.

Try the following (tested against FSF GCC:

/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 */ float f(float a, float b, float c) { return a * b + c; } /* end of tmp.c */

I would have expected that you would need a gcc option to tell it that non-IEEE floating point results are OK. Otherwise it's not valid to emit a fused multiply-add for a*b+c because IEEE specifies that you should get a rounding step between the multiply and the add. Or does gcc default to non-IEEE arithmetic?

Maybe adding -ffast-math does something.

I have checked, it does not make any difference.

...

-- Mans Rullgard / mru

linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Matthew Gretton-Dann

11:49 a.m.

On 9 October 2012 12:09, Peter Maydell peter.maydell@linaro.org wrote:

...

On 9 October 2012 11:21, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...
On 9 October 2012 10:37, Jubi Taneja jubitaneja@gmail.com wrote:

...
I wanted to see the difference in objdump of an application where I can make the difference between the VFPV3 and VFPV4 support. I tried enabling the flag -mfpu=vfpv3 and -mfpu=vfpv4 for ARM Cortex A15 toolchain in my test code but cannot see the difference in two objdumps.

Try the following (tested against FSF GCC:

/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 */ float f(float a, float b, float c) { return a * b + c; } /* end of tmp.c */

I would have expected that you would need a gcc option to tell it that non-IEEE floating point results are OK. Otherwise it's not valid to emit a fused multiply-add for a*b+c because IEEE specifies that you should get a rounding step between the multiply and the add. Or does gcc default to non-IEEE arithmetic?

GCC defaults to -ffp-contract=fast which according to the manual:

...

enables floating-point expression contraction such as forming of fused multiply-add operations if the target has native support for them

Which as Peter notes is not IEEE compliant.

Thanks,

Matt

-- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-dann@linaro.org

Jubi Taneja

1:45 p.m.

On Tue, Oct 9, 2012 at 4:39 PM, Peter Maydell peter.maydell@linaro.orgwrote:

...

On 9 October 2012 11:21, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:

...
On 9 October 2012 10:37, Jubi Taneja jubitaneja@gmail.com wrote:

...
I wanted to see the difference in objdump of an application where I can

make

...
...
the difference between the VFPV3 and VFPV4 support. I tried enabling the flag -mfpu=vfpv3 and -mfpu=vfpv4 for ARM Cortex A15 toolchain in my test code but cannot see the difference in two objdumps.

Try the following (tested against FSF GCC:

/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 */ float f(float a, float b, float c) { return a * b + c; } /* end of tmp.c */

I would have expected that you would need a gcc option to tell it that non-IEEE floating point results are OK. Otherwise it's not valid to emit a fused multiply-add for a*b+c because IEEE specifies that you should get a rounding step between the multiply and the add. Or does gcc default to non-IEEE arithmetic?

I need to know more about it. I can comment after verifying it.

...

-- PMM

4667

days inactive

4680

days old

linaro-toolchain@lists.linaro.org

13 comments

participants

tags (0)

participants (4)

Jubi Taneja
Mans Rullgard
Matthew Gretton-Dann
Peter Maydell