Hi,
On Mon, 21 Feb 2011 17:20:52 +0000 Richard Sandiford richard.sandiford@linaro.org wrote:
One of the vectorisation discussions from last year was about the poor code GCC generates for vld{2,3,4}_*() and vst{2,3,4}_*(). It forces the result of the loads onto the stack, then loads the individual pieces from there. It does the same thing in reverse for stores.
I think there are two major problems here: [...]
I wanted to finish this off before making it public, but I think that to do so at this point is just going to result in duplicated effort. The attached patch does several things:
1. Struct (tree) types are defined via hard-wired code in the ARM backend rather than in arm_neon.h. The "type mode" of those struct types is overridden to be an extra-wide vector, the width of the whole struct (so int32x2x2_t would be V4SImode, etc.).
2. Builtins (__builtin_neon_*) which previously used "big" integer modes to pass/return values, are initialised such that they directly pass/return the struct types above instead. The intrinsic wrappers in arm_neon.h no longer need to use unions to pun the types of arguments & return values.
3. When those builtins are expanded, they now use the extra-wide vectors. The corresponding instruction patterns in neon.md also use wide vectors (rather than wide integer modes).
4. CANNOT_CHANGE_MODE_CLASS is redefined, somewhat like you describe.
The patch seems to work OK, at least for C. A couple of glaring issues remain:
1. It's only tested very lightly.
2. C++ is totally broken. The creation of "NEON" structs in the backend knows nothing of the subtle nuances of the C++ frontend's representation of structs/classes (one idea is to provide a "promotion" language hook in the C++ frontend to turn a C-like struct into a C++-like struct, but I don't know how acceptable that would be upstream).
3. There may be ABI-related issues regarding passing/returning the new backend-created struct types to/from regular functions.
The patch does not fix:
The VMOV is a bit disappointing, and needs further investigation.
But that should be fairly easy: we just need to expand to subreg operations for vcombine, vget_high/vget_low, etc. -- which I believe will work fine now (it didn't at some point in the distant past, which is why we have hardwired "vmov"s all over the place). Big-endian mode may need some care to be taken, as ever.
I don't think I'm going to have time to work on this in the immediate future: please feel free to use it as a base, or ignore it if your approach is simpler/better :-).
Cheers,
Julian
ChangeLog
gcc/ * config/arm/arm.c (arm_legitimate_address_outer_p) (thumb2_legitimate_address_p, output_move_neon): Use VALID_NEON_VEC_ARRAY_MODE instead of VALID_NEON_STRUCT_MODE. (arm_legitimate_index_p, thumb2_legitimate_index_p): Permit only zero indices for big vector modes. (arm_attr_length_move_neon): Check for explicit vector modes rather than big integer modes (OImode, etc.). (arm_hard_regno_mode_ok): Likewise. (arm_can_change_mode_class): New. Implement ARM_CANNOT_CHANGE_MODE_CLASS macro (negated). (arm_build_neon_struct_type): New. (neon_builtin_type_bits): Add unsigned/polynomial variants. Remove wide integer variants. (*_UP): Adjust accordingly. (T_MAX): Update number of variants. (*_REALMODE, REALMODE): Define macros. (CF2, CF3): New macros. (CF): Adjust to use above. (neon_builtin_data): Adjust entries for VTBL, VTBX and element/structure loads/stores to use explicit unsigned/polynomial variants where appropriate. (vec_array_struct_type_info): New. (arm_init_neon_builtins): Add explicit unsigned/polynomial types. Create structures used for passing/returning multiple vectors, and use to initialise builtins which use them. * config/arm/arm.h (VALID_NEON_STRUCT_MODE): Remove. (VALID_NEON_VEC_ARRAY_MODE): New macro. (CANNOT_CHANGE_MODE_CLASS): Remove comment. Implement using arm_can_change_mode_class. * config/arm/arm-modes.def (VECTOR_MODES (INT, ...)): Add 24-, 32-, 48- and 64-bit integer vector modes. (VECTOR_MODES (FLOAT, ...)): Likewise, for float modes. (EI, OI, CI, XI): Remove opaque integer modes. * config/arm/arm-protos.h (arm_can_change_mode_class): Add prototype. * config/arm/neon.md (VTAB, VTAB_n, V_PAIR, V_pair): Remove unused mode iterators/attributes. (VSTRUCT): Use explicit vector types instead of opaque integers. (VSTRUCT3, VSTRUCT4, VSTRUCT6, VSTRUCT8): New mode iterators. (VSTR3_LO, VSTR3_HI, VSTR_PART): New mode attributes. (V_two_elem, V_three_elem, V_four_elem): Use explicit vector types where appropriate. (V_DOUBLE): Add new wide vector types. (V_TRIPLE, V_TRIPLE_HALF, V_QUAD, V_QUAD_HALF): New. (*neon_mov<mode> splitters): Split using vector types rather than opaque integers. (neon_vtbl2v8qi, neon_vtbl3v8qi, neon_vtbl4v8qi, neon_vtbx2v8qi) (neon_vtbx3v8qi, neon_vtbx4v8qi): Use vector types not opaque integers. (neon_vld2<mode>, neon_vld2_lane<mode>, neon_vld2_dup<mode>) (neon_vst2<mode>, neon_vst2_lane<mode>, neon_vld3<mode>) (neon_vld3qa<mode>, neon_vld3qb<mode>, neon_vld3_lane<mode>) (neon_vld3_dup<mode>, neon_vst3<mode>, neon_vst3qa<mode>) (neon_vst3qb<mode>, neon_vst3_lane<mode>, neon_vld4<mode>) (neon_vld4qa<mode>, neon_vld4qb<mode>, neon_vld4_lane<mode>) (neon_vst4<mode>, neon_vst4qa<mode>, neon_vst4qb<mode>) (neon_vst4_lane<mode>): Use wide vector modes instead of opaque integer ones. * config/arm/neon.ml (features): Add Distinct_types feature. (ops): Use Distinct_types for VTBL, VTBX, element/structure load/store instruction definitions. * config/arm/neon-gen.ml (cast_for_return): Only add cast when needed. (return): When returning a multiple vectors via a struct, just return the value rather than forcing type-conversion through a union. (params): Similar, when passing a struct to a builtin. (type_suffix): New. (print_variant): Support intrinsics which use distinct types for each underlying builtin. (arrtypes): Emit typedefs to builtin struct type names, rather than actually defining structs. * config/arm/arm_neon.h: Regenerate.