Re: LLVM ARM NEON VMUL.f32

20 Mar 2013

      On 19 March 2013 21:56, Renato Golin renato.golin@linaro.org wrote:
...
Basically, LLVM chooses to lower single-precision FMUL to NEON's VMUL.f32
instead of VFP's version because, on some cores (A8, A5 and Apple's Swift),
the VFP variant is really slow.
This is all cool and dandy, but NEON is not IEEE 754 compliant, so the
result is slightly different. So slightly that only one test, that was
really pushing the boundaries (ie. going below FLT_MIN) did catch it.
There are two ways we can go here:

Strict IEEE compatibility and *only* lower NEON's VMUL if unsafe-math is

on. This will make generic single-prec. code slower but you can always turn
unsafe-math on if you want more speed.

Continue using NEON for f32 by default and put a note somewhere that

people should turn this option (FeatureNEONForFP) off on A5/A8 if they
*really* care about maximum IEEE compliance.
Apple already said that for Darwin, 2 is still the option of choice. Do we
agree and ignore this issue? Or for GNU/EABI we want strict conformance by
default?
This seems straightforward to me. You have a user facing flag for
controlling whether you can deviate from IEEE754 in the name of
performance (unsafe-math), so you should honour it. This has the
secondary advantage of following gcc behaviour, and the primary
advantage of not being confusing or requiring people to use
architecture-specific feature flags just to get standard fp behaviour.
Anybody actually writing code which uses 32 bit floats in performance
critical code can apply unsafe-math if it helps them.
-- PMM

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: LLVM ARM NEON VMUL.f32