Julian Brown julian@codesourcery.com wrote on 11/10/2010 04:29:15 PM:
In further followups (at the risk of misrepresenting Joseph & Paul Brook's opinions!), there seemed to be general agreement that a scheme something like that outlined below, with "permuting" loads/stores and some way of handling multiple in-register layouts for vectors seems like it will be a necessary addition to the vectorizer, going forward.
Hi,
Let me check that I understand the problem first: the problem is that VLD1 and VST1 instructions in big endian mode follow the array numbering of elements, while all other memory instructions (VLDR, VLDM,VSTR, VSTM) do not. So, do we have two problems here? The first one that VLD1/VST1 and VLDR, etc. can't be mixed in one computation. And the second one, that access to a single element is incorrect, when VLDR, etc. are used. Is that correct? In addition, we need to think about how to represent VLD2/3, so the vectorizer can use them. Right?
I'm thinking (without having much idea about how feasible such an idea is) of something along the lines of a function (in the mathematical sense) attached to each vector value manipulated by the vectorizer, to map that value's element numberings to and from memory offsets.
Joseph Myers joseph@codesourcery.com wrote on 08/10/2010 02:54:29 AM:
Make it possible to describe in generic RTL a permuting vector load whose alignment requirement is element alignment, describe vld1 that way, and teach the vectorizer how to use such loads and stores.
Does that mean that the vectorizer will be aware of specific instructions?
I can see several places where the order of elements is important in vectorizer's code generation: - interleave_high/low and widening operations - but I am not sure that the current implementation suits NEON best, so maybe those are less important
- extraction of scalar result in reduction
The ARM implementations of reduction operations fortuitously calculate the results across all elements simultaneously, so when one of those elements is extracted, we still get the right answer.
So, does that mean that's not a problem?
- various scalar/invariant vectors, including initializations for reduction and induction
- the order of elements in loads and stores should match
Thanks, Ira