Hi Mathew
On Tue, Oct 9, 2012 at 11:37 PM, Matthew Gretton-Dann < matthew.gretton-dann@linaro.org> wrote:
On 9 October 2012 14:44, Jubi Taneja jubitaneja@gmail.com wrote:
On Tue, Oct 9, 2012 at 5:21 PM, Matthew Gretton-Dann matthew.gretton-dann@linaro.org wrote:
/* arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 */ float f(float a, float b, float c) { return a * b + c; } /* end of tmp.c */
(Note that -mfloat-abi=softfp will also work in this example. Which one you want to use depends on whether you have configured your
system
for hard or soft-float ABIs).
I checked both with -mfpu=vfpv3 and -mfpu=vfpv4 and it generates the same assembly code. VMLA insn is emitted for both the cases. I was
wondering
if I can get any test case so that I may observe the difference in the two objdumps.
Which compiler are you using? VFMA support is only in trunk FSF GCC. Linaro has not yet backported support to 4.7.
I am using FSF GCC only.
What version of GCC (what does arm-none-linux-gneabi-gcc -v report?).
# arm-none-linux-gneabi-gcc -v Using built-in specs. COLLECT_GCC=arm-none-linux-gneabi-gcc COLLECT_LTO_WRAPPER=/opt/toolchains/arm/bin/../libexec/gcc/arm-none-linux-gneabi/4.6.3/lto-wrapper Target: arm-none-linux-gneabi Configured with: /home/user/arm-src/build/sources/gcc_1/configure --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=arm-none-linux-gneabi --prefix=/opt/arm --with-sysroot=/opt/arm/arm-none-linux-gneabi/sys-root --disable-libmudflap --disable-libssp --disable-libgomp --disable-nls --disable-libstdcxx-pch --with-interwork --with-mode=arm --with-fpu=vfp3 --with-cpu=cortex-a9 --with-tune=cortex-a9 --with-float=softfp --enable-extra-vd-multilibs --enable-poison-system-directories --enable-long-long --enable-threads --enable-languages=c,c++ --enable-shared --enable-lto --enable-symvers=gnu --enable-__cxa_atexit --with-pkgversion=arm-toolchain.v1 --with-gnu-as --with-gnu-ld --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-build-time-tools=/opt/arm/bin --with-gmp=/opt/arm --with-mpfr=/opt/arm --with-ppl=/opt/arm --with-cloog=/opt/arm --with-libelf=/opt/arm Thread model: posix gcc version 4.6.3 (arm-toolchain.v1)
When I compile the test case above with a recent (within last month or
so) trunk GCC I get the following output which uses vfma:
$ /work/builds/gcc-fsf-arm-none-linux-gnueabi/tools/bin/arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- /tmp/fma.c -mfloat-abi=hard -O2 .cpu cortex-a15 .eabi_attribute 27, 3 .eabi_attribute 28, 1 .fpu vfpv4 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .file "fma.c" .text .align 2 .global f .type f, %function f: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vfma.f32 s2, s0, s1 fcpys s0, s2 bx lr .size f, .-f .ident "GCC: (GNU) 4.8.0 20120913 (experimental)" .section .note.GNU-stack,"",%progbits
--
$ arm-none-linux-gnueabi-gcc -mcpu=cortex-a15 -mfpu=vfpv4 -S -o- prog.c -O2 .cpu cortex-a15 .eabi_attribute 27, 3 .fpu vfpv4 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 0 .eabi_attribute 18, 4 .file "prog.c" .section .text.f,"ax",%progbits .align 2 .global f .type f, %function f: .fnstart @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. fmsr s14, r0 fmsr s13, r2 fmsr s15, r1 fmacs s13, s14, s15 fmrs r0, s13 bx lr .fnend .size f, .-f .section .text.startup.main,"ax",%progbits .align 2 .global main .type main, %function main: .fnstart @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. bx lr .fnend .size main, .-main .ident "GCC: (VDLinux.GA1.2012-10-03) 4.6.4" .section .note.GNU-stack,"",%progbits
I could not conclude the difference in two results and the overall conclusion for my query... Can you please guide to dig deeper in it?
Jubi
Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-dann@linaro.org