On Wed, 3 Nov 2010 14:57:01 +0200 Ira Rosen IRAR@il.ibm.com wrote:
-mfloat-abi=softfp/-mfloat-abi=hard -mfpu=neon* [-march=armv7-a]
- there are several variants for this, e.g. neon, neon-fp16,
neon-vfpv4 ... generally -mfpu=neon should do though, for Cortex-A8 chips at least.
gcc.dg/vect/vect.exp runs all the vectorizer tests in gcc.dg/vect directory. I was wondering why the only flag used for NEON is -ffast-math and why other flags, like -mfpu=neon, are not used.
I'm not sure about this. There might be dejagnu magic somewhere, or we might rely on e.g. multilib options or configured-in defaults to turn NEON on as appropriate.
- config/arm/arm.c (arm_autovectorize_vector_sizes): New
function. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Define.
Looks fine to me (though I'm not an ARM maintainer, so can't approve it upstream). It's a bit of a pity the hook is defined in such a way that it doesn't support non-power-of-two sized vectors, but never mind...
I am curious what platform has non-power-of-two sized vectors?
I think I was muddling up non-power-of-two vectors with vector "group" sizes used by e.g. vld3/vst3, but of course NEON doesn't have the former, and the latter are entirely irrelevant in this context. So, apologies for the noise :-).
(I did come across a MIPS variant with length-3 vectors at one point, but of course that's pretty much irrelevant here also. AFAIK it's not even publicly documented.)
I wonder if this allows us to remove -mvectorize-with-neon-quad already (or, perhaps, wire it on but make the option a no-op, for possible backward-compatibility reasons)?
What vector size we want as default? We can make 128 default and fall back to 64 if vectorization fails for some reason (or we can always start with 64 and switch to 128 if necessary). We can also add -mvectorize-with-neon-double, and use it and -mvectorize-with-neon-quad to set vector size and prevent auto-detection.
I think it's probably fine to default to 128-bit vectors, and fall back to 64-bits when necessary (where access patterns block usage of wider vectors, or similar). AIUI, ARM were quite keen to get rid of -mvectorize-with-neon-quad altogether, so I'm not sure it makes sense to add a new -double option also: particularly since with widening/narrowing operations, both vector sizes are generally needed simultaneously.
The best solution would be to evaluate costs for both size options. And it is a reasonable amount of work to do that. But the unknown loop bound case will require versioning between two vector options in addition to possible versioning between vector/scalar loops.
I don't know if we can make a decision without tuning, especially since
- NEON hardware available at the time (Cortex-A8) only processed
data in 64-bit chunks, so Q-reg operations weren't necessarily any faster than D-reg operations (that may still be true).
This is why I thought that starting from the option to switch to 64 if 128 fails (with -mvectorize-with-neon-quad flag) is the least intrusive.
I'm not sure. The best option may well depend on the particular core (A8 vs A9 vs A15), and users will generally want to have the right option (whatever that turns out to be) as the default, without having to grub around in the documentation.
(Maybe if we make -mvectorize-with-neon-quad "wired-on" but otherwise a no-op, we could add e.g. a --param to say "prefer 64-bit vectors" or "prefer 128-bit vectors" (falling back to 64-bit as necessary), for benchmarking purposes and/or intrepid users.)
CC'ing Richard E., in case he has any input.
Cheers,
Julian