Re: Auto-detection of vector size for NEON

5 Nov 2010

      On Wed, 3 Nov 2010 14:57:01 +0200
Ira Rosen IRAR@il.ibm.com wrote:
...
...
-mfloat-abi=softfp/-mfloat-abi=hard -mfpu=neon* [-march=armv7-a]

there are several variants for this, e.g. neon, neon-fp16,

neon-vfpv4 ... generally -mfpu=neon should do though, for Cortex-A8
chips at least.
gcc.dg/vect/vect.exp runs all the vectorizer tests in gcc.dg/vect
directory. I was wondering why the only flag used for NEON is
-ffast-math and why other flags, like -mfpu=neon, are not used.
I'm not sure about this. There might be dejagnu magic somewhere, or we
might rely on e.g. multilib options or configured-in defaults to turn
NEON on as appropriate.
...
...
...

config/arm/arm.c (arm_autovectorize_vector_sizes): New

function.
   (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Define.
Looks fine to me (though I'm not an ARM maintainer, so can't
approve it upstream). It's a bit of a pity the hook is defined in
such a way that it doesn't support non-power-of-two sized vectors,
but never mind...
I am curious what platform has non-power-of-two sized vectors?
I think I was muddling up non-power-of-two vectors with vector "group"
sizes used by e.g. vld3/vst3, but of course NEON doesn't have the
former, and the latter are entirely irrelevant in this context. So,
apologies for the noise :-).
(I did come across a MIPS variant with length-3 vectors at one point,
but of course that's pretty much irrelevant here also. AFAIK it's not
even publicly documented.)
...
...
I wonder if this allows us to remove -mvectorize-with-neon-quad
already (or, perhaps, wire it on but make the option a no-op, for
possible backward-compatibility reasons)?
What vector size we want as default? We can make 128 default and fall
back to 64 if vectorization fails for some reason (or we can always
start with 64 and switch to 128 if necessary). We can also add
-mvectorize-with-neon-double, and use it and
-mvectorize-with-neon-quad to set vector size and prevent
auto-detection.
I think it's probably fine to default to 128-bit vectors, and fall back
to 64-bits when necessary (where access patterns block usage of wider
vectors, or similar). AIUI, ARM were quite keen to get rid of
-mvectorize-with-neon-quad altogether, so I'm not sure it makes sense
to add a new -double option also: particularly since with
widening/narrowing operations, both vector sizes are generally needed
simultaneously.
...
The best solution would be to evaluate costs for both size options.
And it is a reasonable amount of work to do that. But the unknown
loop bound case will require versioning between two vector options in
addition to possible versioning between vector/scalar loops.
I don't know if we can make a decision without tuning, especially
since
...

NEON hardware available at the time (Cortex-A8) only processed

data in 64-bit chunks, so Q-reg operations weren't necessarily any
faster than D-reg operations (that may still be true).
This is why I thought that starting from the option to switch to 64
if 128 fails (with -mvectorize-with-neon-quad flag) is the least
intrusive.
I'm not sure. The best option may well depend on the particular core
(A8 vs A9 vs A15), and users will generally want to have the right
option (whatever that turns out to be) as the default, without having
to grub around in the documentation.
(Maybe if we make -mvectorize-with-neon-quad "wired-on" but otherwise a
no-op, we could add e.g. a --param to say "prefer 64-bit vectors" or
"prefer 128-bit vectors" (falling back to 64-bit as necessary), for
benchmarking purposes and/or intrepid users.)
CC'ing Richard E., in case he has any input.
Cheers,
Julian

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Auto-detection of vector size for NEON