Julian Brown julian@codesourcery.com wrote on 05/11/2010 12:58:14 PM:
I think it's probably fine to default to 128-bit vectors, and fall back to 64-bits when necessary (where access patterns block usage of wider vectors, or similar). AIUI, ARM were quite keen to get rid of -mvectorize-with-neon-quad altogether, so I'm not sure it makes sense to add a new -double option also: particularly since with widening/narrowing operations, both vector sizes are generally needed simultaneously.
Right, mixed vector sizes make it irrelevant.
The best solution would be to evaluate costs for both size options. And it is a reasonable amount of work to do that. But the unknown loop bound case will require versioning between two vector options in addition to possible versioning between vector/scalar loops.
I don't know if we can make a decision without tuning, especially since
- NEON hardware available at the time (Cortex-A8) only processed
data in 64-bit chunks, so Q-reg operations weren't necessarily any faster than D-reg operations (that may still be true).
This is why I thought that starting from the option to switch to 64 if 128 fails (with -mvectorize-with-neon-quad flag) is the least intrusive.
I'm not sure. The best option may well depend on the particular core (A8 vs A9 vs A15), and users will generally want to have the right option (whatever that turns out to be) as the default, without having to grub around in the documentation.
(Maybe if we make -mvectorize-with-neon-quad "wired-on" but otherwise a no-op,
Since TARGET_NEON_VECTORIZE_QUAD is only used in arm_preferred_simd_mode and arm_autovectorize_vector_sizes, we can simply remove it, making 128 the default. (I am not sure I fully understand "wired-on" but otherwise a no-op"...).
Index: config/arm/arm.c =================================================================== --- config/arm/arm.c (revision 166032) +++ config/arm/arm.c (working copy) @@ -246,6 +246,7 @@ static bool arm_builtin_support_vector_misalignmen const_tree type, int misalignment, bool is_packed); +static unsigned int arm_autovectorize_vector_sizes (void);
/* Table of machine attributes. */ @@ -391,6 +392,9 @@ static const struct default_options arm_option_opt #define TARGET_VECTOR_MODE_SUPPORTED_P arm_vector_mode_supported_p #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arm_preferred_simd_mode +#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES +#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \ + arm_autovectorize_vector_sizes
#undef TARGET_MACHINE_DEPENDENT_REORG #define TARGET_MACHINE_DEPENDENT_REORG arm_reorg @@ -22025,15 +22029,14 @@ arm_preferred_simd_mode (enum machine_mode mode) switch (mode) { case SFmode: - return TARGET_NEON_VECTORIZE_QUAD ? V4SFmode : V2SFmode; + return V4SFmode; case SImode: - return TARGET_NEON_VECTORIZE_QUAD ? V4SImode : V2SImode; + return V4SImode; case HImode: - return TARGET_NEON_VECTORIZE_QUAD ? V8HImode : V4HImode; + return V8HImode; case QImode: - return TARGET_NEON_VECTORIZE_QUAD ? V16QImode : V8QImode; + return V16QImode; case DImode: - if (TARGET_NEON_VECTORIZE_QUAD) return V2DImode; break;
@@ -23223,6 +23226,12 @@ arm_expand_sync (enum machine_mode mode, } }
+static unsigned int +arm_autovectorize_vector_sizes (void) +{ + return 16 | 8; +} +
we could add e.g. a --param to say "prefer 64-bit vectors" or "prefer 128-bit vectors" (falling back to 64-bit as necessary), for benchmarking purposes and/or intrepid users.)
ARM specific param?
Thanks, Ira
CC'ing Richard E., in case he has any input.
Cheers,
Julian