Re: question on aarch64 libm

19 Jan 2016


      On 19-01-2016 03:49, Siddhesh Poyarekar wrote:
...
On 19 January 2016 at 00:06, Adhemerval Zanella
adhemerval.zanella@linaro.org wrote:
...
No one has posted any patch or stirred discussions about it.  The complex
function in libm are usually coded in in C to be platform neutral, with
some specific function being optimized (rounding, etc.). x86_64 also have
some assembly implementations for some specific routines (exp, log, ...),
but I also do not have number about how fast are they related to C
counterparts (it also might be the case where the speedup is not that
high to validate the assembly existence).
A correction here: i686 has a lot of assembly math implementations,
x86_64 doesn't.  The last x86_64 asm implementation was sincos which
was removed because it was not accurate enough for our project goals.
The i686 asm versions (and for other archs, I think alpha and m68k)
are there because nobody cares enough about their precision.  The i686
functions for example are known to not be precise for the entire input
domain.
I do see some x86_64 specialized implementation being used currently
(sysdeps/x86_64/fpu/s_{sin,cos}f.S for instance). The sincos implementations
is still used (sysdeps/x86_64/fpu/s_sincosf.S).
What you referring that glibc has dropped is the utilization of the
fsin/fcos/fsincos Intel instructions, which shows a ridiculous error
range depending of the inputs [1].
[1] https://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-boun...
...
...
Rule of thumb currently in GLIBC is to avoid as possible arch-assembly
routines and work with C implementation that are platform neutral with
possible arch hooks on sensitive performance paths (check Siddhesh
recent sincos performance improvements).
The general rule here is to more or less guarantee that the algorithm
does not lose precision regardless of the language it is written in.
However if you want the community also to support it actively, writing
it in C is your best bet.
...
For very critical performance paths we also have the option to add
specific build with more aggressive optimization flags along with
IFUNC support (for instance one for A57 and another for A72, if
it is such the case).
This is the cheapest way to squeeze out some performance, provided
that the compiler is tuned correctly.  This is in fact what we do in
x86_64 with ifunc implementations for avx, sse2 and fma4.
Siddhesh

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: question on aarch64 libm