On Mon, 1 Nov 2010 15:57:11 +0200 Ira Rosen IRAR@il.ibm.com wrote:
It looks like it's enough to implement targetm.vectorize. autovectorize_vector_sizes for NEON in order to enable initial auto-detection of vector size. With the attached patch and -mvectorize-with-neon-quad flag, the vectorizer first tries to vectorize for 128 bit, and if this fails, it tries to vectorize for 64 bit. For example, in the attached testcase number of iterations is too small for 128 bit (first 2 iterations have to be peeled in order to align the array accesses), but is sufficient for 64 bit (the accesses are aligned here).
Cool, I hadn't found out about that hook yet (nor that UNITS_PER_SIMD_WORD had disappeared in favour of TARGET_VECTORIZE_PREFERRED_SIMD_MODE!).
I'd appreciate your comments on the patch, and I also have a few questions: 1. Why the default vector size is 64?
Ah, well, that was a decision made several years ago, and may no longer be the optimal choice. There are several possible reasons:
1. NEON hardware available at the time (Cortex-A8) only processed data in 64-bit chunks, so Q-reg operations weren't necessarily any faster than D-reg operations (that may still be true).
2. Initially misaligned operations were not supported, so larger vectors increase the chances that loop versioning would be required.
3. Generally, setup/tear-down code is assumed to be larger for wider vectors, and ARM hardware generally isn't very well-endowed with cache.
The choice wasn't supported by benchmarking, since IIRC we didn't have real hardware when we implemented initial NEON support, and nobody's revisited the decision since.
- Where is the place of NEON vectorization tests? I found NEON tests
with intrinsics at gcc.target/arm, is that the right place?
Looks like a good place to me. It might be an idea to avoid putting stuff in gcc.target/arm/neon, since most of those tests are auto-generated.
- According to gcc.dg/vect/vect.exp the only flag that is used for
NEON (in addition to target independent flags) is -ffast-math. Is that enough?
I'm not sure what you're asking here: there aren't any other target-specific flags to tweak NEON behaviour. Generally to enable NEON you need to set the float ABI and FPU variant (if they're not configured-in to the compiler), e.g.:
-mfloat-abi=softfp/-mfloat-abi=hard -mfpu=neon* [-march=armv7-a]
* there are several variants for this, e.g. neon, neon-fp16, neon-vfpv4 ... generally -mfpu=neon should do though, for Cortex-A8 chips at least.
- config/arm/arm.c (arm_autovectorize_vector_sizes): New
function. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Define.
Looks fine to me (though I'm not an ARM maintainer, so can't approve it upstream). It's a bit of a pity the hook is defined in such a way that it doesn't support non-power-of-two sized vectors, but never mind...
I wonder if this allows us to remove -mvectorize-with-neon-quad already (or, perhaps, wire it on but make the option a no-op, for possible backward-compatibility reasons)?
Thanks,
Julian