Re: Auto-detection of vector size for NEON

8 Nov 2010

Julian Brown julian@codesourcery.com wrote on 05/11/2010 12:58:14 PM:
...
I think it's probably fine to default to 128-bit vectors, and fall back
to 64-bits when necessary (where access patterns block usage of wider
vectors, or similar). AIUI, ARM were quite keen to get rid of
-mvectorize-with-neon-quad altogether, so I'm not sure it makes sense
to add a new -double option also: particularly since with
widening/narrowing operations, both vector sizes are generally needed
simultaneously.
Right, mixed vector sizes make it irrelevant.
...
...
The best solution would be to evaluate costs for both size options.
And it is a reasonable amount of work to do that. But the unknown
loop bound case will require versioning between two vector options in
addition to possible versioning between vector/scalar loops.
I don't know if we can make a decision without tuning, especially
since
...

NEON hardware available at the time (Cortex-A8) only processed

data in 64-bit chunks, so Q-reg operations weren't necessarily any
faster than D-reg operations (that may still be true).
This is why I thought that starting from the option to switch to 64
if 128 fails (with -mvectorize-with-neon-quad flag) is the least
intrusive.
I'm not sure. The best option may well depend on the particular core
(A8 vs A9 vs A15), and users will generally want to have the right
option (whatever that turns out to be) as the default, without having
to grub around in the documentation.
(Maybe if we make -mvectorize-with-neon-quad "wired-on" but otherwise a
no-op,
Since TARGET_NEON_VECTORIZE_QUAD is only used in arm_preferred_simd_mode
and arm_autovectorize_vector_sizes, we can simply remove it, making 128 the
default. (I am not sure I fully understand "wired-on" but otherwise a
no-op"...).
Index: config/arm/arm.c
===================================================================

--- config/arm/arm.c    (revision 166032)
+++ config/arm/arm.c    (working copy)
@@ -246,6 +246,7 @@ static bool arm_builtin_support_vector_misalignmen
                                                     const_tree type,
                                                     int misalignment,
                                                     bool is_packed);
+static unsigned int arm_autovectorize_vector_sizes (void);
/* Table of machine attributes.  */
@@ -391,6 +392,9 @@ static const struct default_options arm_option_opt
 #define TARGET_VECTOR_MODE_SUPPORTED_P arm_vector_mode_supported_p
 #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arm_preferred_simd_mode
+#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
+#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
+  arm_autovectorize_vector_sizes
#undef  TARGET_MACHINE_DEPENDENT_REORG
 #define TARGET_MACHINE_DEPENDENT_REORG arm_reorg
@@ -22025,15 +22029,14 @@ arm_preferred_simd_mode (enum machine_mode mode)
     switch (mode)
       {
       case SFmode:
-       return TARGET_NEON_VECTORIZE_QUAD ? V4SFmode : V2SFmode;
+       return V4SFmode;
       case SImode:
-       return TARGET_NEON_VECTORIZE_QUAD ? V4SImode : V2SImode;
+       return V4SImode;
       case HImode:
-       return TARGET_NEON_VECTORIZE_QUAD ? V8HImode : V4HImode;
+       return V8HImode;
       case QImode:
-       return TARGET_NEON_VECTORIZE_QUAD ? V16QImode : V8QImode;
+       return V16QImode;
       case DImode:
-       if (TARGET_NEON_VECTORIZE_QUAD)
          return V2DImode;
        break;
@@ -23223,6 +23226,12 @@ arm_expand_sync (enum machine_mode mode,
     }
 }
+static unsigned int
+arm_autovectorize_vector_sizes (void)
+{
+  return 16 | 8;
+}
+
...
we could add e.g. a --param to say "prefer 64-bit vectors" or
"prefer 128-bit vectors" (falling back to 64-bit as necessary), for
benchmarking purposes and/or intrepid users.)
ARM specific param?
Thanks,
Ira
...
CC'ing Richard E., in case he has any input.
Cheers,
Julian

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Auto-detection of vector size for NEON