On 13 April 2011 18:48, Richard Sandiford richard.sandiford@linaro.org wrote:
I've now submitted the initial vldN and vstN work, so I thought I'd see how often it triggers for natty's libav package. I've put some initial results here:
https://wiki.linaro.org/RichardSandiford/Sandbox/NeonLibAv
There are more files to go through, so this isn't complete. I've also left out cases that were very similar to the ones given.
Some of the code is reasonable, while others are obviously not as good as they could be. I don't think the problems are really to do with the vldN and vstN work itself though. They seem to be due to the underlying interleaved load/store detection, or in the handling of widening operations.
I'll take a closer look later, for now WRT
Another example where we use interleaving poorly. We load &buf[i], &buf[i + 1], &buf[i + 2] and &buf[i + 3] into separate base registers, load 4 elements from each, but only use one element.
For such loads with gaps, the vectorizer indeed generates loads for every element, hoping that DCE removes it later. It seemed to work until now.
Thanks, Ira
Richard
linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain