On Thu, Dec 02, 2010 at 10:54:32AM +0200, Ira Rosen wrote:
On 1 December 2010 17:57, Daniel Jacobowitz dan@codesourcery.com wrote:
On Wed, Dec 01, 2010 at 11:16:16AM +0200, Ira Rosen wrote:
The meaning of the builtin (or maybe a new tree code would be better?) is that the elements of v0, v1 and v2 are deinterleaved. I wanted the MEM_REFs, since we actually have three data accesses here, and something (builtin or tree code) to indicate the deinterleaving. Since the vectors are passed to the builtin, I don't think it's a problem if the statements get separated. When the expander sees the builtin, it has to remove the loads it created for the MEM_REFs and create a new "vector load multiple and deinterleave". Is that possible?
This is a problem I've struggled with before. My only caution is that representing the MEM_REF's separately from the deinterleaving in the IR allows all sorts of ways (many we haven't thought of yet) for them to get separated, and there's no instruction to efficiently implement the deinterleaving from registers. For instance, suppose a pseudo gets propagated into the builtin and we can't find the MEM_REFs any more. The resulting code could easily be worse than pre-vectorization.
I see. So one builtin for everything, like
vector_load_deinterleave (v0, v1, v2,..., stride,...)
is our only option?
It's not the only option; the way you've described might work, too.
But yes, it's my opinion that a single builtin is less likely to generate something the compiler can't recover from.