Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)

30 Nov 2010


      On Wed, 17 Nov 2010 11:21:03 +0000
Julian Brown julian@codesourcery.com wrote:
...
Shouldn't be more than 1-2 days, but I've been distracted by other
bugs...
Well, this took a little longer than I thought. I think it's a good
incremental improvement though: it should improve code in a few cases
in little-endian mode, as well as fixing some of the bugs in big-endian
mode.
I wrote down a few rules regarding element numberings and subregs, which
might make the patch easier to understand:
1. For D-sized subregs of Q registers:
Offset 0: least-significant part (always)
     Offset 8: most-significant part (always)
By my (possibly incorrect) reading of:
http://gcc.gnu.org/onlinedocs/gccint/Regs-and-Memory.html#index-subreg-2013
This is true because the byte offsets follow memory ordering: in either
little-endian or big-endian mode, the less-significant D register in a
pair comprising a Q register corresponds to a lower memory address.
2. For lane access we should use:
Big-endian mode,
least significant        ....       most significant
    Dn[halfelts-1]..Dn[0]  D(n+1)[halfelts-1]..D(n+1)[0]
Little-endian mode,
least significant        ....       most significant
    Dn[0]..Dn[halfelts-1]  D(n+1)[0]..D(n+1)[halfelts-1]
3. GCC's expectation for lane-access is that:
Big-endian mode,
least significant        ....       most significant
    D[elts-1] or Q[elts-1]     ....         D[0] or Q[0]
Little-endian mode,
least significant        ....       most significant
    D[0] or Q[0]      ....        D[elts-1] or Q[elts-1]
4. "Lo" refers to the least-significant part of a vector register, and
"hi" refers to the most-significant part.
The patch touches quite a number of patterns to update to the "new"
interpretation of lane numbers as outlined above. A couple of things
remain a little awkward:
* Two places in generic code depend on the endianness of the target in
ways which don't seem to match (4) above
(supportable_widening_operation and vect_permute_store_chain): "lo" and
"hi" have their meanings exchanged in big-endian mode. I've worked
around this by introducing a VECTOR_ELEMENTS_BIG_ENDIAN macro, which is
always false on ARM: I'm not very happy with it though. The effect is
simply to stop the "swapping" happening for ARM. I vaguely think the
middle-end code which does this swapping is probably incorrect to start
with.
* I've used subregs to avoid emitting useless "vmov" instructions in
many cases, but this causes problems (in combine) with unpack
instructions which use vector sign_extend/zero_extend operations:
basically I think that something in combine doesn't understand that the
zero/sign extensions are happening element-wise, thus considers them to
be somehow equivalent to the subreg op. I'm not entirely sure what's
causing this, so I've just worked around it using UNSPEC for now
(marked with FIXME).
Tested very lightly (vect.exp only, little/big-endian, with and without
-mvectorize-with-neon-quad). Currently against CS's trunk branch. Any
comments or suggestions? I'm not sure where I should apply this if it's
OK and passes more thorough testing...
Cheers,
Julian
ChangeLog
gcc/
    * defaults.h (VECTOR_ELEMENTS_BIG_ENDIAN): Define.
    * tree-vect-data-refs.c (vect_permute_store_chain): Use
    VECTOR_ELEMENTS_BIG_ENDIAN.
    * tree-vect-stmts.c (supportable_widening_operation): Likewise.
    * config/arm/arm.c (arm_can_change_mode_class): New.
    * config/arm/arm.h (VECTOR_ELEMENTS_BIG_ENDIAN): New.
    (CANNOT_CHANGE_MODE_CLASS): Use arm_can_change_mode_class.
    * config/arm/arm-protos.h (arm_can_change_mode_class): Add
    prototype.
    * config/arm/neon.md (SE_magic): New code attribute.
    (vec_extract<mode>): Alter element numbering used for extract
    operations in big-endian mode.
    (vec_shr_<mode>, vec_shl_<mode>): Disable in big-endian mode.
    (neon_move_lo_quad_<mode>, neon_move_hi_quad_<mode>): Remove.
    (move_hi_quad_<mode>, move_lo_quad_<mode>): Use subregs.
    (neon_vec_unpack<US>_lo_move, neon_vec_unpack<US>_hi_mode): Use
    s_register_operand, fix output formatting.
    (vec_unpack<US>_hi_<mode>, vec_unpack<US>_lo_<mode>): Fix for
    big-endian mode.
    (neon_vec_<US>mult_lo_<mode>, neon_vec_<US>mult_hi_<mode>): Use
    s_register_operand, fix output formatting.
    (vec_widen_<US>mult_lo_<mode>, vec_widen_<US>mult_hi_<mode>): Fix
    for big-endian mode.
    (neon_unpack_<US>_mode): Use s_register_operand.
    (vec_unpack<US>_lo_<mode>, vec_unpack<US>_hi_<mode>): Use subregs
    instead of neon_vget_low/high. Work around combiner breakage.
    (neon_vec_<US>mult_<mode>): (D reg version) use s_register_operand.
    (vec_widen_<US>mult_hi_<mode>, vec_widen_<US>mult_lo_<mode>):
    Similar (D reg versions).
    (vec_pack_trunc_<mode>): (D reg version) Change to expander. Use
    s_register_operand. Use vector subregs.
    (neon_vec_pack_trunc_<mode>): Use s_register_operand.
    (vec_pack_trunc_<mode>): (Q reg version) Use s_register_operand. Fix
    for big-endian mode.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)