Re: Improving the code generated for vld and vst intrinsics

22 Feb 2011

      On Tue, 22 Feb 2011 09:42:15 +0000
Richard Sandiford richard.sandiford@linaro.org wrote:
...
Julian Brown julian@codesourcery.com writes:
...
Richard Sandiford richard.sandiford@linaro.org wrote:

Struct (tree) types are defined via hard-wired code in the ARM

backend rather than in arm_neon.h. The "type mode" of those struct
types is overridden to be an extra-wide vector, the width of the
whole struct (so int32x2x2_t would be V4SImode, etc.).
FWIW, I was going to try to avoid this.  I think instead we should
automatically use vector modes for structures like those in
arm_neon.h, via a target hook.  It would improve the code quality for
general code that has the same sort of small-array-of-vectors
structure.  (In other words, this would help when using the generic
vector extensions rather than the Neon-specific builtins.)
That sounds like a good plan, I think.
...
...

Builtins (__builtin_neon_*) which previously used "big" integer

modes to pass/return values, are initialised such that they
directly pass/return the struct types above instead. The intrinsic
wrappers in arm_neon.h no longer need to use unions to pun the
types of arguments & return values.
Yeah, I'd wondered about that too.  However, these days, I think we
ought to be able to generate good code for this type of union, and we
seem to for the cases I've tried.  In the end I thought it was better
to keep the underlying built-in function close to the rtl pattern.
E.g. the fact that the name of the field is "val" seems more like an
arm_neon.h detail than something that should be hard-coded into GCC.
I still think it's a good idea to get rid of the unions in this case,
or at least, replace the wide-integer modes in the unions with wide
vectors. But I'm happy with whatever works :-).
...
...

When those builtins are expanded, they now use the extra-wide

vectors. The corresponding instruction patterns in neon.md also use
wide vectors (rather than wide integer modes).
I was also going to try defining non-power-of-two vectors.  Glad to
hear it works!  (That was actually the main motivation for doing the
rtl side first: to see whether it really would be OK to ask the
vectoriser to treat these values as single vectors.)
(Caveat: that's one of the things that isn't well-tested. It works to
define those modes, at least.)
...
I think we should keep the integer modes too, though, just like we
allow DImode for double registers.  (I'm surprised to see we don't
allow TImode for quad registers TBH -- might look into that.)
Given that there are no architectual restrictions on mode punning,
these integer modes are useful neutral ground.
I'm not convinced by that. Consider that:
1. There are no useful operations on the wide-integer types.
2. There is no way of representing constants in RTL for the
wide-integer types (we have an internal bug where a constant-zero
OImode value is synthesized when using NEON intrinsics, and it leads to
an ICE: see emit-rtl.c:immed_double_const -- OImode, etc. are wider
than 2 * HOST_BITS_PER_WIDE_INT, so fail the second assertion).
Unless we can show that the wide-integer modes are really needed, and
patch things up so (2) no longer holds, I'd strongly prefer to see them
disappear. (As a side note, we obviously want to avoid wide-integer or
wide-vector modes *ever* being reloaded via core registers, since there
simply aren't enough of them for that to be possible. I think that may
be one of the reasons that integer-equivalent modes for each size have
traditionally been used by the compiler?).
...
...
...
The VMOV is a bit disappointing, and needs further investigation.
But that should be fairly easy: we just need to expand to subreg
operations for vcombine, vget_high/vget_low, etc. -- which I believe
will work fine now (it didn't at some point in the distant past,
which is why we have hardwired "vmov"s all over the place).
Big-endian mode may need some care to be taken, as ever.
This VMOV is coming from a plain register-register SET that we fail to
optimise away.  The pattern is:
(set (reg NEWX) (reg OLDX))
(set (subreg (reg NEWX 0)) (plus (subreg (reg OLDX 0)) ...)) ;

OLDX dead (set (subreg (reg NEWX 8)) (plus (subreg (reg NEWX 8)) ...))
    ...
where what we really want is for the second instruction to use NEWX
(or for NEWX not to exist at all, whichever way to prefer to look at
it). I think it's a general subreg optimisation problem.
Hmm, not sure about that then.
...
...
I don't think I'm going to have time to work on this in the
immediate future: please feel free to use it as a base, or ignore
it if your approach is simpler/better :-).
Thanks.  I might well end up "borrowing" the vector-mode stuff.
Cheers,
Julian

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Improving the code generated for vld and vst intrinsics