Re: Auto-detection of vector size for NEON

3 Nov 2010

      On Mon, 1 Nov 2010 15:57:11 +0200
Ira Rosen IRAR@il.ibm.com wrote:
...
It looks like it's enough to implement targetm.vectorize.
autovectorize_vector_sizes for NEON in order to enable initial
auto-detection of vector size. With the attached patch and
-mvectorize-with-neon-quad flag, the vectorizer first tries to
vectorize for 128 bit, and if this fails, it tries to vectorize for
64 bit. For example, in the attached testcase number of iterations is
too small for 128 bit (first 2 iterations have to be peeled in order
to align the array accesses), but is sufficient for 64 bit (the
accesses are aligned here).
Cool, I hadn't found out about that hook yet (nor that
UNITS_PER_SIMD_WORD had disappeared in favour of
TARGET_VECTORIZE_PREFERRED_SIMD_MODE!).
...
I'd appreciate your comments on the patch, and I also have a few
questions: 1. Why the default vector size is 64?
Ah, well, that was a decision made several years ago, and may no longer
be the optimal choice. There are several possible reasons:
1. NEON hardware available at the time (Cortex-A8) only processed data
in 64-bit chunks, so Q-reg operations weren't necessarily any faster
than D-reg operations (that may still be true).
2. Initially misaligned operations were not supported, so larger
vectors increase the chances that loop versioning would be required.
3. Generally, setup/tear-down code is assumed to be larger for wider
vectors, and ARM hardware generally isn't very well-endowed with cache.
The choice wasn't supported by benchmarking, since IIRC we didn't
have real hardware when we implemented initial NEON support, and
nobody's revisited the decision since.
...

Where is the place of NEON vectorization tests? I found NEON tests

with intrinsics at gcc.target/arm, is that the right place?
Looks like a good place to me. It might be an idea to avoid putting
stuff in gcc.target/arm/neon, since most of those tests are
auto-generated.
...

According to gcc.dg/vect/vect.exp the only flag that is used for

NEON (in addition to target independent flags) is -ffast-math. Is
that enough?
I'm not sure what you're asking here: there aren't any other
target-specific flags to tweak NEON behaviour. Generally to enable NEON
you need to set the float ABI and FPU variant (if they're not
configured-in to the compiler), e.g.:
-mfloat-abi=softfp/-mfloat-abi=hard -mfpu=neon* [-march=armv7-a]
* there are several variants for this, e.g. neon, neon-fp16,
neon-vfpv4 ... generally -mfpu=neon should do though, for Cortex-A8
chips at least.
...

config/arm/arm.c (arm_autovectorize_vector_sizes): New

function.
   (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Define.
Looks fine to me (though I'm not an ARM maintainer, so can't approve it
upstream). It's a bit of a pity the hook is defined in such a way that
it doesn't support non-power-of-two sized vectors, but never mind...
I wonder if this allows us to remove -mvectorize-with-neon-quad
already (or, perhaps, wire it on but make the option a no-op, for
possible backward-compatibility reasons)?
Thanks,
Julian

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Auto-detection of vector size for NEON