On Wed, 17 Nov 2010 11:21:03 +0000 Julian Brown julian@codesourcery.com wrote:
Shouldn't be more than 1-2 days, but I've been distracted by other bugs...
Well, this took a little longer than I thought. I think it's a good incremental improvement though: it should improve code in a few cases in little-endian mode, as well as fixing some of the bugs in big-endian mode.
I wrote down a few rules regarding element numberings and subregs, which might make the patch easier to understand:
1. For D-sized subregs of Q registers:
Offset 0: least-significant part (always) Offset 8: most-significant part (always)
By my (possibly incorrect) reading of:
http://gcc.gnu.org/onlinedocs/gccint/Regs-and-Memory.html#index-subreg-2013
This is true because the byte offsets follow memory ordering: in either little-endian or big-endian mode, the less-significant D register in a pair comprising a Q register corresponds to a lower memory address.
2. For lane access we should use:
Big-endian mode,
least significant .... most significant Dn[halfelts-1]..Dn[0] D(n+1)[halfelts-1]..D(n+1)[0]
Little-endian mode,
least significant .... most significant Dn[0]..Dn[halfelts-1] D(n+1)[0]..D(n+1)[halfelts-1]
3. GCC's expectation for lane-access is that:
Big-endian mode,
least significant .... most significant D[elts-1] or Q[elts-1] .... D[0] or Q[0]
Little-endian mode,
least significant .... most significant D[0] or Q[0] .... D[elts-1] or Q[elts-1]
4. "Lo" refers to the least-significant part of a vector register, and "hi" refers to the most-significant part.
The patch touches quite a number of patterns to update to the "new" interpretation of lane numbers as outlined above. A couple of things remain a little awkward:
* Two places in generic code depend on the endianness of the target in ways which don't seem to match (4) above (supportable_widening_operation and vect_permute_store_chain): "lo" and "hi" have their meanings exchanged in big-endian mode. I've worked around this by introducing a VECTOR_ELEMENTS_BIG_ENDIAN macro, which is always false on ARM: I'm not very happy with it though. The effect is simply to stop the "swapping" happening for ARM. I vaguely think the middle-end code which does this swapping is probably incorrect to start with.
* I've used subregs to avoid emitting useless "vmov" instructions in many cases, but this causes problems (in combine) with unpack instructions which use vector sign_extend/zero_extend operations: basically I think that something in combine doesn't understand that the zero/sign extensions are happening element-wise, thus considers them to be somehow equivalent to the subreg op. I'm not entirely sure what's causing this, so I've just worked around it using UNSPEC for now (marked with FIXME).
Tested very lightly (vect.exp only, little/big-endian, with and without -mvectorize-with-neon-quad). Currently against CS's trunk branch. Any comments or suggestions? I'm not sure where I should apply this if it's OK and passes more thorough testing...
Cheers,
Julian
ChangeLog
gcc/ * defaults.h (VECTOR_ELEMENTS_BIG_ENDIAN): Define. * tree-vect-data-refs.c (vect_permute_store_chain): Use VECTOR_ELEMENTS_BIG_ENDIAN. * tree-vect-stmts.c (supportable_widening_operation): Likewise. * config/arm/arm.c (arm_can_change_mode_class): New. * config/arm/arm.h (VECTOR_ELEMENTS_BIG_ENDIAN): New. (CANNOT_CHANGE_MODE_CLASS): Use arm_can_change_mode_class. * config/arm/arm-protos.h (arm_can_change_mode_class): Add prototype. * config/arm/neon.md (SE_magic): New code attribute. (vec_extract<mode>): Alter element numbering used for extract operations in big-endian mode. (vec_shr_<mode>, vec_shl_<mode>): Disable in big-endian mode. (neon_move_lo_quad_<mode>, neon_move_hi_quad_<mode>): Remove. (move_hi_quad_<mode>, move_lo_quad_<mode>): Use subregs. (neon_vec_unpack<US>_lo_move, neon_vec_unpack<US>_hi_mode): Use s_register_operand, fix output formatting. (vec_unpack<US>_hi_<mode>, vec_unpack<US>_lo_<mode>): Fix for big-endian mode. (neon_vec_<US>mult_lo_<mode>, neon_vec_<US>mult_hi_<mode>): Use s_register_operand, fix output formatting. (vec_widen_<US>mult_lo_<mode>, vec_widen_<US>mult_hi_<mode>): Fix for big-endian mode. (neon_unpack_<US>_mode): Use s_register_operand. (vec_unpack<US>_lo_<mode>, vec_unpack<US>_hi_<mode>): Use subregs instead of neon_vget_low/high. Work around combiner breakage. (neon_vec_<US>mult_<mode>): (D reg version) use s_register_operand. (vec_widen_<US>mult_hi_<mode>, vec_widen_<US>mult_lo_<mode>): Similar (D reg versions). (vec_pack_trunc_<mode>): (D reg version) Change to expander. Use s_register_operand. Use vector subregs. (neon_vec_pack_trunc_<mode>): Use s_register_operand. (vec_pack_trunc_<mode>): (Q reg version) Use s_register_operand. Fix for big-endian mode.