Re: NEON vectorization: use of specialized load/store instructions

14 Oct 2010


      On Thu, 14 Oct 2010, Ira Rosen wrote:
...
Let me check that I understand the problem first: the problem is that VLD1
and VST1 instructions in big endian mode follow the array numbering of
elements, while all other memory instructions (VLDR, VLDM,VSTR, VSTM) do
not. So, do we have two problems here? The first one that VLD1/VST1 and
VLDR, etc. can't be mixed in one computation. And the second one, that
access to a single element is incorrect, when VLDR, etc. are used. Is that
correct?
In terms of the native lane numbering used in NEON instructions, VLD1 and 
VST1 respect array ordering and are the instructions that can be used with 
single-element accesses, while the other instructions do not respect the 
ordering and cannot be so used without adjusting the element numbers.
In terms of the architecture-independent RTL semantics, VLDR, VLDM, VSTR 
and VSTM respect array ordering and can be used with single-element 
accesses, while VLD1 and VST1 do not respect the ordering and cannot be so 
used without adjusting element numbers.
The VLDR etc. order is the one required to be used for argument passing 
and return of vectors, and is the only one readily available when vectors 
are loaded/stored using core registers rather than NEON registers.
Thus, when generic RTL is generated from a NEON instrinsic (defined using 
native lane numbering) in big-endian mode, the lane number is adjusted to 
make the generic RTL correct, and when assembly code is generated from 
generic RTL the reverse adjustment is made.
...
In addition, we need to think about how to represent VLD2/3, so the
vectorizer can use them. Right?
Yes.  (I think code using arrays of red/green/blue values is the sort of 
real-world (and benchmark) code expected to be vectorized using VLD3.)
...
Joseph Myers joseph@codesourcery.com wrote on 08/10/2010 02:54:29 AM:
...
Make it possible to describe in generic RTL a permuting
vector load whose alignment requirement is element alignment, describe
vld1 that way, and teach the vectorizer how to use such loads and stores.
Does that mean that the vectorizer will be aware of specific instructions?
I would imagine that it would need to know what permutations are 
available, yes (GIMPLE and RTL would have some form of general permuting 
load/store operation, which the vectorizer would only generate where 
relevant instructions exist for the chosen permutation).
-- 
Joseph S. Myers
joseph@codesourcery.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: NEON vectorization: use of specialized load/store instructions